Tree

MatrixTree

class supar.structs.tree.MatrixTree(scores, lens=None, multiroot=False)[source]

MatrixTree for calculating partitions and marginals of non-projective dependency trees in \(O(n^3)\) by an adaptation of Kirchhoff’s MatrixTree Theorem [Koo et al. 2007].

Parameters
  • scores (Tensor) – [batch_size, seq_len, seq_len]. Scores of all possible dependent-head pairs.

  • lens (LongTensor) – [batch_size]. Sentence lengths for masking, regardless of root positions. Default: None.

  • multiroot (bool) – If False, requires the tree to contain only a single root. Default: True.

Examples

>>> from supar import MatrixTree
>>> batch_size, seq_len = 2, 5
>>> lens = torch.tensor([3, 4])
>>> arcs = torch.tensor([[0, 2, 0, 4, 2], [0, 3, 1, 0, 3]])
>>> s1 = MatrixTree(torch.randn(batch_size, seq_len, seq_len), lens)
>>> s2 = MatrixTree(torch.randn(batch_size, seq_len, seq_len), lens)
>>> s1.max
tensor([0.7174, 3.7910], grad_fn=<SumBackward1>)
>>> s1.argmax
tensor([[0, 0, 1, 1, 0],
        [0, 4, 1, 0, 3]])
>>> s1.log_partition
tensor([2.0229, 6.0558], grad_fn=<CopyBackwards>)
>>> s1.log_prob(arcs)
tensor([-3.2209, -2.5756], grad_fn=<SubBackward0>)
>>> s1.entropy
tensor([1.9711, 3.4497], grad_fn=<SubBackward0>)
>>> s1.kl(s2)
tensor([1.3354, 2.6914], grad_fn=<AddBackward0>)
property max

Computes the max score of the distribution \(p(y)\).

property argmax

Computes \(\arg\max_y p(y)\) of the distribution \(p(y)\).

kmax(k)[source]

Computes the k-max of the distribution \(p(y)\).

sample()[source]

Obtains a structured sample from the distribution \(y \sim p(y)\). TODO: multi-sampling.

property entropy

Computes entropy \(H[p]\) of the distribution \(p(y)\).

cross_entropy(other)[source]

Computes cross-entropy \(H[p,q]\) of self and another distribution.

Parameters

other (StructuredDistribution) – Comparison distribution.

kl(other)[source]

Computes KL-divergence \(KL[p \parallel q]=H[p,q]-H[p]\) of self and another distribution.

Parameters

other (StructuredDistribution) – Comparison distribution.

DependencyCRF

class supar.structs.tree.DependencyCRF(scores, lens=None, multiroot=False)[source]

First-order TreeCRF for projective dependency trees [Eisner 2000, Zhang et al. 2020a].

Parameters
  • scores (Tensor) – [batch_size, seq_len, seq_len]. Scores of all possible dependent-head pairs.

  • lens (LongTensor) – [batch_size]. Sentence lengths for masking, regardless of root positions. Default: None.

  • multiroot (bool) – If False, requires the tree to contain only a single root. Default: True.

Examples

>>> from supar import DependencyCRF
>>> batch_size, seq_len = 2, 5
>>> lens = torch.tensor([3, 4])
>>> arcs = torch.tensor([[0, 2, 0, 4, 2], [0, 3, 1, 0, 3]])
>>> s1 = DependencyCRF(torch.randn(batch_size, seq_len, seq_len), lens)
>>> s2 = DependencyCRF(torch.randn(batch_size, seq_len, seq_len), lens)
>>> s1.max
tensor([3.6346, 1.7194], grad_fn=<IndexBackward>)
>>> s1.argmax
tensor([[0, 2, 3, 0, 0],
        [0, 0, 3, 1, 1]])
>>> s1.log_partition
tensor([4.1007, 3.3383], grad_fn=<IndexBackward>)
>>> s1.log_prob(arcs)
tensor([-1.3866, -5.5352], grad_fn=<SubBackward0>)
>>> s1.entropy
tensor([0.9979, 2.6056], grad_fn=<IndexBackward>)
>>> s1.kl(s2)
tensor([1.6631, 2.6558], grad_fn=<IndexBackward>)
property argmax

Computes \(\arg\max_y p(y)\) of the distribution \(p(y)\).

topk(k)[source]

Computes the k-argmax of the distribution \(p(y)\).

Dependency2oCRF

class supar.structs.tree.Dependency2oCRF(scores, lens=None, multiroot=False)[source]

Second-order TreeCRF for projective dependency trees [McDonald & Pereira 2006, Zhang et al. 2020a].

Parameters
  • scores (Tensor) – [batch_size, seq_len, seq_len]. Scores of all possible dependent-head pairs.

  • lens (LongTensor) – [batch_size]. Sentence lengths for masking, regardless of root positions. Default: None.

  • multiroot (bool) – If False, requires the tree to contain only a single root. Default: True.

Examples

>>> from supar import Dependency2oCRF
>>> batch_size, seq_len = 2, 5
>>> lens = torch.tensor([3, 4])
>>> arcs = torch.tensor([[0, 2, 0, 4, 2], [0, 3, 1, 0, 3]])
>>> sibs = torch.tensor([CoNLL.get_sibs(i) for i in arcs[:, 1:].tolist()])
>>> s1 = Dependency2oCRF((torch.randn(batch_size, seq_len, seq_len),
                          torch.randn(batch_size, seq_len, seq_len, seq_len)),
                         lens)
>>> s2 = Dependency2oCRF((torch.randn(batch_size, seq_len, seq_len),
                          torch.randn(batch_size, seq_len, seq_len, seq_len)),
                         lens)
>>> s1.max
tensor([0.7574, 3.3634], grad_fn=<IndexBackward>)
>>> s1.argmax
tensor([[0, 3, 3, 0, 0],
        [0, 4, 4, 4, 0]])
>>> s1.log_partition
tensor([1.9906, 4.3599], grad_fn=<IndexBackward>)
>>> s1.log_prob((arcs, sibs))
tensor([-0.6975, -6.2845], grad_fn=<SubBackward0>)
>>> s1.entropy
tensor([1.6436, 2.1717], grad_fn=<IndexBackward>)
>>> s1.kl(s2)
tensor([0.4929, 2.0759], grad_fn=<IndexBackward>)
property argmax

Computes \(\arg\max_y p(y)\) of the distribution \(p(y)\).

topk(k)[source]

Computes the k-argmax of the distribution \(p(y)\).

ConstituencyCRF

class supar.structs.tree.ConstituencyCRF(scores, lens=None)[source]

Constituency TreeCRF [Stern et al. 2017, Zhang et al. 2020b].

Parameters
  • scores (Tensor) – [batch_size, seq_len, seq_len]. Scores of all constituents.

  • lens (LongTensor) – [batch_size]. Sentence lengths for masking.

Examples

>>> from supar import ConstituencyCRF
>>> batch_size, seq_len = 2, 5
>>> lens = torch.tensor([3, 4])
>>> charts = torch.tensor([[[0, 1, 0, 1, 0],
                            [0, 0, 1, 1, 0],
                            [0, 0, 0, 1, 0],
                            [0, 0, 0, 0, 0],
                            [0, 0, 0, 0, 0]],
                           [[0, 1, 1, 0, 1],
                            [0, 0, 1, 0, 0],
                            [0, 0, 0, 1, 1],
                            [0, 0, 0, 0, 1],
                            [0, 0, 0, 0, 0]]]).bool()
>>> s1 = ConstituencyCRF(torch.randn(batch_size, seq_len, seq_len), lens)
>>> s2 = ConstituencyCRF(torch.randn(batch_size, seq_len, seq_len), lens)
>>> s1.max
tensor([ 2.5068, -0.5628], grad_fn=<IndexBackward>)
>>> s1.argmax
[[[0, 3], [0, 1], [1, 3], [1, 2], [2, 3]], [[0, 4], [0, 2], [0, 1], [1, 2], [2, 4], [2, 3], [3, 4]]]
>>> s1.log_partition
tensor([2.9235, 0.0154], grad_fn=<IndexBackward>)
>>> s1.log_prob(charts)
tensor([-0.4167, -0.5781], grad_fn=<SubBackward0>)
>>> s1.entropy
tensor([0.6415, 1.2026], grad_fn=<IndexBackward>)
>>> s1.kl(s2)
tensor([0.0362, 2.9017], grad_fn=<IndexBackward>)
property argmax

Computes \(\arg\max_y p(y)\) of the distribution \(p(y)\).

topk(k)[source]

Computes the k-argmax of the distribution \(p(y)\).