Dependency Models¶

BiaffineDependencyModel¶

class supar.models.dep.BiaffineDependencyModel(n_words, n_rels, n_tags=None, n_chars=None, encoder='lstm', feat=['char'], n_embed=100, n_pretrained=100, n_feat_embed=100, n_char_embed=50, n_char_hidden=100, char_pad_index=0, elmo='original_5b', elmo_bos_eos=(True, False), bert=None, n_bert_layers=4, mix_dropout=0.0, bert_pooling='mean', bert_pad_index=0, freeze=True, embed_dropout=0.33, n_lstm_hidden=400, n_lstm_layers=3, encoder_dropout=0.33, n_arc_mlp=500, n_rel_mlp=100, mlp_dropout=0.33, scale=0, pad_index=0, unk_index=1, **kwargs)[source]¶

The implementation of Biaffine Dependency Parser [Dozat & Manning 2017].

Parameters

n_words (int) – The size of the word vocabulary.
n_rels (int) – The number of labels in the treebank.
n_tags (int) – The number of POS tags, required if POS tag embeddings are used. Default: None.
n_chars (int) – The number of characters, required if character-level representations are used. Default: None.
encoder (str) – Encoder to use. 'lstm': BiLSTM encoder. 'bert': BERT-like pretrained language model (for finetuning), e.g., 'bert-base-cased'. Default: 'lstm'.
feat (list[str]) – Additional features to use, required if encoder='lstm'. 'tag': POS tag embeddings. 'char': Character-level representations extracted by CharLSTM. 'bert': BERT representations, other pretrained language models like RoBERTa are also feasible. Default: ['char'].
n_embed (int) – The size of word embeddings. Default: 100.
n_pretrained (int) – The size of pretrained word embeddings. Default: 100.
n_feat_embed (int) – The size of feature representations. Default: 100.
n_char_embed (int) – The size of character embeddings serving as inputs of CharLSTM, required if using CharLSTM. Default: 50.
n_char_hidden (int) – The size of hidden states of CharLSTM, required if using CharLSTM. Default: 100.
char_pad_index (int) – The index of the padding token in the character vocabulary, required if using CharLSTM. Default: 0.
elmo (str) – Name of the pretrained ELMo registered in ELMoEmbedding.OPTION. Default: 'original_5b'.
elmo_bos_eos (tuple[bool]) – A tuple of two boolean values indicating whether to keep start/end boundaries of elmo outputs. Default: (True, False).
bert (str) – Specifies which kind of language model to use, e.g., 'bert-base-cased'. This is required if encoder='bert' or using BERT features. The full list can be found in transformers. Default: None.
n_bert_layers (int) – Specifies how many last layers to use, required if encoder='bert' or using BERT features. The final outputs would be weighted sum of the hidden states of these layers. Default: 4.
mix_dropout (float) – The dropout ratio of BERT layers, required if encoder='bert' or using BERT features. Default: .0.
bert_pooling (str) – Pooling way to get token embeddings. first: take the first subtoken. last: take the last subtoken. mean: take a mean over all. Default: mean.
bert_pad_index (int) – The index of the padding token in BERT vocabulary, required if encoder='bert' or using BERT features. Default: 0.
freeze (bool) – If True, freezes BERT parameters, required if using BERT features. Default: True.
embed_dropout (float) – The dropout ratio of input embeddings. Default: .33.
n_lstm_hidden (int) – The size of LSTM hidden states. Default: 400.
n_lstm_layers (int) – The number of LSTM layers. Default: 3.
encoder_dropout (float) – The dropout ratio of encoder layer. Default: .33.
n_arc_mlp (int) – Arc MLP size. Default: 500.
n_rel_mlp (int) – Label MLP size. Default: 100.
mlp_dropout (float) – The dropout ratio of MLP layers. Default: .33.
scale (float) – Scaling factor for affine scores. Default: 0.
pad_index (int) – The index of the padding token in the word vocabulary. Default: 0.
unk_index (int) – The index of the unknown token in the word vocabulary. Default: 1.

forward(words, feats=None)[source]¶

Parameters

words (LongTensor) – [batch_size, seq_len]. Word indices.
feats (list[LongTensor]) – A list of feat indices. The size is either [batch_size, seq_len, fix_len] if feat is 'char' or 'bert', or [batch_size, seq_len] otherwise. Default: None.

Returns

The first tensor of shape [batch_size, seq_len, seq_len] holds scores of all possible arcs. The second of shape [batch_size, seq_len, seq_len, n_labels] holds scores of all possible labels on each arc.

Return type

Tensor, Tensor

loss(s_arc, s_rel, arcs, rels, mask, partial=False)[source]¶

Parameters

s_arc (Tensor) – [batch_size, seq_len, seq_len]. Scores of all possible arcs.
s_rel (Tensor) – [batch_size, seq_len, seq_len, n_labels]. Scores of all possible labels on each arc.
arcs (LongTensor) – [batch_size, seq_len]. The tensor of gold-standard arcs.
rels (LongTensor) – [batch_size, seq_len]. The tensor of gold-standard labels.
mask (BoolTensor) – [batch_size, seq_len]. The mask for covering the unpadded tokens.
partial (bool) – True denotes the trees are partially annotated. Default: False.

Returns

The training loss.

Return type

Tensor

decode(s_arc, s_rel, mask, tree=False, proj=False)[source]¶

Parameters

s_arc (Tensor) – [batch_size, seq_len, seq_len]. Scores of all possible arcs.
s_rel (Tensor) – [batch_size, seq_len, seq_len, n_labels]. Scores of all possible labels on each arc.
mask (BoolTensor) – [batch_size, seq_len]. The mask for covering the unpadded tokens.
tree (bool) – If True, ensures to output well-formed trees. Default: False.
proj (bool) – If True, ensures to output projective trees. Default: False.

Returns

Predicted arcs and labels of shape [batch_size, seq_len].

Return type

LongTensor, LongTensor

CRFDependencyModel¶

class supar.models.dep.CRFDependencyModel(n_words, n_rels, n_tags=None, n_chars=None, encoder='lstm', feat=['char'], n_embed=100, n_pretrained=100, n_feat_embed=100, n_char_embed=50, n_char_hidden=100, char_pad_index=0, elmo='original_5b', elmo_bos_eos=(True, False), bert=None, n_bert_layers=4, mix_dropout=0.0, bert_pooling='mean', bert_pad_index=0, freeze=True, embed_dropout=0.33, n_lstm_hidden=400, n_lstm_layers=3, encoder_dropout=0.33, n_arc_mlp=500, n_rel_mlp=100, mlp_dropout=0.33, scale=0, pad_index=0, unk_index=1, **kwargs)[source]¶

The implementation of first-order CRF Dependency Parser [Koo et al. 2007, Ma & Hovy 2017, Zhang et al. 2020a]).

Parameters

n_words (int) – The size of the word vocabulary.
n_rels (int) – The number of labels in the treebank.
n_tags (int) – The number of POS tags, required if POS tag embeddings are used. Default: None.
n_chars (int) – The number of characters, required if character-level representations are used. Default: None.
encoder (str) – Encoder to use. 'lstm': BiLSTM encoder. 'bert': BERT-like pretrained language model (for finetuning), e.g., 'bert-base-cased'. Default: 'lstm'.
feat (list[str]) – Additional features to use, required if encoder='lstm'. 'tag': POS tag embeddings. 'char': Character-level representations extracted by CharLSTM. 'bert': BERT representations, other pretrained language models like RoBERTa are also feasible. Default: ['char'].
n_embed (int) – The size of word embeddings. Default: 100.
n_pretrained (int) – The size of pretrained word embeddings. Default: 100.
n_feat_embed (int) – The size of feature representations. Default: 100.
n_char_embed (int) – The size of character embeddings serving as inputs of CharLSTM, required if using CharLSTM. Default: 50.
n_char_hidden (int) – The size of hidden states of CharLSTM, required if using CharLSTM. Default: 100.
char_pad_index (int) – The index of the padding token in the character vocabulary, required if using CharLSTM. Default: 0.
elmo (str) – Name of the pretrained ELMo registered in ELMoEmbedding.OPTION. Default: 'original_5b'.
elmo_bos_eos (tuple[bool]) – A tuple of two boolean values indicating whether to keep start/end boundaries of elmo outputs. Default: (True, False).
bert (str) – Specifies which kind of language model to use, e.g., 'bert-base-cased'. This is required if encoder='bert' or using BERT features. The full list can be found in transformers. Default: None.
n_bert_layers (int) – Specifies how many last layers to use, required if encoder='bert' or using BERT features. The final outputs would be weighted sum of the hidden states of these layers. Default: 4.
mix_dropout (float) – The dropout ratio of BERT layers, required if encoder='bert' or using BERT features. Default: .0.
bert_pooling (str) – Pooling way to get token embeddings. first: take the first subtoken. last: take the last subtoken. mean: take a mean over all. Default: mean.
bert_pad_index (int) – The index of the padding token in BERT vocabulary, required if encoder='bert' or using BERT features. Default: 0.
freeze (bool) – If True, freezes BERT parameters, required if using BERT features. Default: True.
embed_dropout (float) – The dropout ratio of input embeddings. Default: .33.
n_lstm_hidden (int) – The size of LSTM hidden states. Default: 400.
n_lstm_layers (int) – The number of LSTM layers. Default: 3.
encoder_dropout (float) – The dropout ratio of encoder layer. Default: .33.
n_arc_mlp (int) – Arc MLP size. Default: 500.
n_rel_mlp (int) – Label MLP size. Default: 100.
mlp_dropout (float) – The dropout ratio of MLP layers. Default: .33.
scale (float) – Scaling factor for affine scores. Default: 0.
pad_index (int) – The index of the padding token in the word vocabulary. Default: 0.
unk_index (int) – The index of the unknown token in the word vocabulary. Default: 1.
proj (bool) – If True, takes DependencyCRF as inference layer, MatrixTree otherwise. Default: True.

loss(s_arc, s_rel, arcs, rels, mask, mbr=True, partial=False)[source]¶

Parameters

s_arc (Tensor) – [batch_size, seq_len, seq_len]. Scores of all possible arcs.
s_rel (Tensor) – [batch_size, seq_len, seq_len, n_labels]. Scores of all possible labels on each arc.
arcs (LongTensor) – [batch_size, seq_len]. The tensor of gold-standard arcs.
rels (LongTensor) – [batch_size, seq_len]. The tensor of gold-standard labels.
mask (BoolTensor) – [batch_size, seq_len]. The mask for covering the unpadded tokens.
mbr (bool) – If True, returns marginals for MBR decoding. Default: True.
partial (bool) – True denotes the trees are partially annotated. Default: False.

Returns

The training loss and original arc scores of shape [batch_size, seq_len, seq_len] if mbr=False, or marginals otherwise.

Return type

Tensor, Tensor

CRF2oDependencyModel¶

class supar.models.dep.CRF2oDependencyModel(n_words, n_rels, n_tags=None, n_chars=None, encoder='lstm', feat=['char'], n_embed=100, n_pretrained=100, n_feat_embed=100, n_char_embed=50, n_char_hidden=100, char_pad_index=0, elmo='original_5b', elmo_bos_eos=(True, False), bert=None, n_bert_layers=4, mix_dropout=0.0, bert_pooling='mean', bert_pad_index=0, freeze=True, embed_dropout=0.33, n_lstm_hidden=400, n_lstm_layers=3, encoder_dropout=0.33, n_arc_mlp=500, n_sib_mlp=100, n_rel_mlp=100, mlp_dropout=0.33, scale=0, pad_index=0, unk_index=1, **kwargs)[source]¶

The implementation of second-order CRF Dependency Parser [Zhang et al. 2020a].

Parameters

n_words (int) – The size of the word vocabulary.
n_rels (int) – The number of labels in the treebank.
n_tags (int) – The number of POS tags, required if POS tag embeddings are used. Default: None.
n_chars (int) – The number of characters, required if character-level representations are used. Default: None.
encoder (str) – Encoder to use. 'lstm': BiLSTM encoder. 'bert': BERT-like pretrained language model (for finetuning), e.g., 'bert-base-cased'. Default: 'lstm'.
feat (list[str]) – Additional features to use, required if encoder='lstm'. 'tag': POS tag embeddings. 'char': Character-level representations extracted by CharLSTM. 'bert': BERT representations, other pretrained language models like RoBERTa are also feasible. Default: ['char'].
n_embed (int) – The size of word embeddings. Default: 100.
n_pretrained (int) – The size of pretrained word embeddings. Default: 100.
n_feat_embed (int) – The size of feature representations. Default: 100.
n_char_embed (int) – The size of character embeddings serving as inputs of CharLSTM, required if using CharLSTM. Default: 50.
n_char_hidden (int) – The size of hidden states of CharLSTM, required if using CharLSTM. Default: 100.
char_pad_index (int) – The index of the padding token in the character vocabulary, required if using CharLSTM. Default: 0.
elmo (str) – Name of the pretrained ELMo registered in ELMoEmbedding.OPTION. Default: 'original_5b'.
elmo_bos_eos (tuple[bool]) – A tuple of two boolean values indicating whether to keep start/end boundaries of elmo outputs. Default: (True, False).
bert (str) – Specifies which kind of language model to use, e.g., 'bert-base-cased'. This is required if encoder='bert' or using BERT features. The full list can be found in transformers. Default: None.
n_bert_layers (int) – Specifies how many last layers to use, required if encoder='bert' or using BERT features. The final outputs would be weighted sum of the hidden states of these layers. Default: 4.
mix_dropout (float) – The dropout ratio of BERT layers, required if encoder='bert' or using BERT features. Default: .0.
bert_pooling (str) – Pooling way to get token embeddings. first: take the first subtoken. last: take the last subtoken. mean: take a mean over all. Default: mean.
bert_pad_index (int) – The index of the padding token in BERT vocabulary, required if encoder='bert' or using BERT features. Default: 0.
freeze (bool) – If True, freezes BERT parameters, required if using BERT features. Default: True.
embed_dropout (float) – The dropout ratio of input embeddings. Default: .33.
n_lstm_hidden (int) – The size of LSTM hidden states. Default: 400.
n_lstm_layers (int) – The number of LSTM layers. Default: 3.
encoder_dropout (float) – The dropout ratio of encoder layer. Default: .33.
n_arc_mlp (int) – Arc MLP size. Default: 500.
n_sib_mlp (int) – Sibling MLP size. Default: 100.
n_rel_mlp (int) – Label MLP size. Default: 100.
mlp_dropout (float) – The dropout ratio of MLP layers. Default: .33.
scale (float) – Scaling factor for affine scores. Default: 0.
pad_index (int) – The index of the padding token in the word vocabulary. Default: 0.
unk_index (int) – The index of the unknown token in the word vocabulary. Default: 1.

forward(words, feats=None)[source]¶

Parameters

words (LongTensor) – [batch_size, seq_len]. Word indices.
feats (list[LongTensor]) – A list of feat indices. The size is either [batch_size, seq_len, fix_len] if feat is 'char' or 'bert', or [batch_size, seq_len] otherwise. Default: None.

Returns

Scores of all possible arcs ([batch_size, seq_len, seq_len]), dependent-head-sibling triples ([batch_size, seq_len, seq_len, seq_len]) and all possible labels on each arc ([batch_size, seq_len, seq_len, n_labels]).

Return type

Tensor, Tensor, Tensor

loss(s_arc, s_sib, s_rel, arcs, sibs, rels, mask, mbr=True, partial=False)[source]¶

Parameters

s_arc (Tensor) – [batch_size, seq_len, seq_len]. Scores of all possible arcs.
s_sib (Tensor) – [batch_size, seq_len, seq_len, seq_len]. Scores of all possible dependent-head-sibling triples.
s_rel (Tensor) – [batch_size, seq_len, seq_len, n_labels]. Scores of all possible labels on each arc.
arcs (LongTensor) – [batch_size, seq_len]. The tensor of gold-standard arcs.
sibs (LongTensor) – [batch_size, seq_len, seq_len]. The tensor of gold-standard siblings.
rels (LongTensor) – [batch_size, seq_len]. The tensor of gold-standard labels.
mask (BoolTensor) – [batch_size, seq_len]. The mask for covering the unpadded tokens.
mbr (bool) – If True, returns marginals for MBR decoding. Default: True.
partial (bool) – True denotes the trees are partially annotated. Default: False.

Returns

The training loss and original arc scores of shape [batch_size, seq_len, seq_len] if mbr=False, or marginals otherwise.

Return type

Tensor, Tensor

decode(s_arc, s_sib, s_rel, mask, tree=False, mbr=True, proj=False)[source]¶

Parameters

s_arc (Tensor) – [batch_size, seq_len, seq_len]. Scores of all possible arcs.
s_sib (Tensor) – [batch_size, seq_len, seq_len, seq_len]. Scores of all possible dependent-head-sibling triples.
s_rel (Tensor) – [batch_size, seq_len, seq_len, n_labels]. Scores of all possible labels on each arc.
mask (BoolTensor) – [batch_size, seq_len]. The mask for covering the unpadded tokens.
tree (bool) – If True, ensures to output well-formed trees. Default: False.
mbr (bool) – If True, performs MBR decoding. Default: True.
proj (bool) – If True, ensures to output projective trees. Default: False.

Returns

Predicted arcs and labels of shape [batch_size, seq_len].

Return type

LongTensor, LongTensor

VIDependencyModel¶

class supar.models.dep.VIDependencyModel(n_words, n_rels, n_tags=None, n_chars=None, encoder='lstm', feat=['char'], n_embed=100, n_pretrained=100, n_feat_embed=100, n_char_embed=50, n_char_hidden=100, char_pad_index=0, elmo='original_5b', elmo_bos_eos=(True, False), bert=None, n_bert_layers=4, mix_dropout=0.0, bert_pooling='mean', bert_pad_index=0, freeze=True, embed_dropout=0.33, n_lstm_hidden=400, n_lstm_layers=3, encoder_dropout=0.33, n_arc_mlp=500, n_sib_mlp=100, n_rel_mlp=100, mlp_dropout=0.33, scale=0, inference='mfvi', max_iter=3, pad_index=0, unk_index=1, **kwargs)[source]¶

The implementation of Dependency Parser using Variational Inference [Wang & Tu 2020].

Parameters

n_words (int) – The size of the word vocabulary.
n_rels (int) – The number of labels in the treebank.
n_tags (int) – The number of POS tags, required if POS tag embeddings are used. Default: None.
n_chars (int) – The number of characters, required if character-level representations are used. Default: None.
encoder (str) – Encoder to use. 'lstm': BiLSTM encoder. 'bert': BERT-like pretrained language model (for finetuning), e.g., 'bert-base-cased'. Default: 'lstm'.
feat (list[str]) – Additional features to use, required if encoder='lstm'. 'tag': POS tag embeddings. 'char': Character-level representations extracted by CharLSTM. 'bert': BERT representations, other pretrained language models like RoBERTa are also feasible. Default: ['char'].
n_embed (int) – The size of word embeddings. Default: 100.
n_pretrained (int) – The size of pretrained word embeddings. Default: 100.
n_feat_embed (int) – The size of feature representations. Default: 100.
n_char_embed (int) – The size of character embeddings serving as inputs of CharLSTM, required if using CharLSTM. Default: 50.
n_char_hidden (int) – The size of hidden states of CharLSTM, required if using CharLSTM. Default: 100.
char_pad_index (int) – The index of the padding token in the character vocabulary, required if using CharLSTM. Default: 0.
elmo (str) – Name of the pretrained ELMo registered in ELMoEmbedding.OPTION. Default: 'original_5b'.
elmo_bos_eos (tuple[bool]) – A tuple of two boolean values indicating whether to keep start/end boundaries of elmo outputs. Default: (True, False).
bert (str) – Specifies which kind of language model to use, e.g., 'bert-base-cased'. This is required if encoder='bert' or using BERT features. The full list can be found in transformers. Default: None.
n_bert_layers (int) – Specifies how many last layers to use, required if encoder='bert' or using BERT features. The final outputs would be weighted sum of the hidden states of these layers. Default: 4.
mix_dropout (float) – The dropout ratio of BERT layers, required if encoder='bert' or using BERT features. Default: .0.
bert_pooling (str) – Pooling way to get token embeddings. first: take the first subtoken. last: take the last subtoken. mean: take a mean over all. Default: mean.
bert_pad_index (int) – The index of the padding token in BERT vocabulary, required if encoder='bert' or using BERT features. Default: 0.
freeze (bool) – If True, freezes BERT parameters, required if using BERT features. Default: True.
embed_dropout (float) – The dropout ratio of input embeddings. Default: .33.
n_lstm_hidden (int) – The size of LSTM hidden states. Default: 400.
n_lstm_layers (int) – The number of LSTM layers. Default: 3.
encoder_dropout (float) – The dropout ratio of encoder layer. Default: .33.
n_arc_mlp (int) – Arc MLP size. Default: 500.
n_sib_mlp (int) – Binary factor MLP size. Default: 100.
n_rel_mlp (int) – Label MLP size. Default: 100.
mlp_dropout (float) – The dropout ratio of MLP layers. Default: .33.
scale (float) – Scaling factor for affine scores. Default: 0.
inference (str) – Approximate inference methods. Default: mfvi.
max_iter (int) – Max iteration times for inference. Default: 3.
interpolation (int) – Constant to even out the label/edge loss. Default: .1.
pad_index (int) – The index of the padding token in the word vocabulary. Default: 0.
unk_index (int) – The index of the unknown token in the word vocabulary. Default: 1.

forward(words, feats=None)[source]¶

Parameters

words (LongTensor) – [batch_size, seq_len]. Word indices.
feats (list[LongTensor]) – A list of feat indices. The size is either [batch_size, seq_len, fix_len] if feat is 'char' or 'bert', or [batch_size, seq_len] otherwise. Default: None.

Returns

Scores of all possible arcs ([batch_size, seq_len, seq_len]), dependent-head-sibling triples ([batch_size, seq_len, seq_len, seq_len]) and all possible labels on each arc ([batch_size, seq_len, seq_len, n_labels]).

Return type

Tensor, Tensor, Tensor

loss(s_arc, s_sib, s_rel, arcs, rels, mask)[source]¶

Parameters

s_arc (Tensor) – [batch_size, seq_len, seq_len]. Scores of all possible arcs.
s_sib (Tensor) – [batch_size, seq_len, seq_len, seq_len]. Scores of all possible dependent-head-sibling triples.
s_rel (Tensor) – [batch_size, seq_len, seq_len, n_labels]. Scores of all possible labels on each arc.
arcs (LongTensor) – [batch_size, seq_len]. The tensor of gold-standard arcs.
rels (LongTensor) – [batch_size, seq_len]. The tensor of gold-standard labels.
mask (BoolTensor) – [batch_size, seq_len]. The mask for covering the unpadded tokens.

Returns

The training loss.

Return type

Tensor

decode(s_arc, s_rel, mask, tree=False, proj=False)[source]¶

Parameters

s_arc (Tensor) – [batch_size, seq_len, seq_len]. Scores of all possible arcs.
s_rel (Tensor) – [batch_size, seq_len, seq_len, n_labels]. Scores of all possible labels on each arc.
mask (BoolTensor) – [batch_size, seq_len]. The mask for covering the unpadded tokens.
tree (bool) – If True, ensures to output well-formed trees. Default: False.
proj (bool) – If True, ensures to output projective trees. Default: False.

Returns

Predicted arcs and labels of shape [batch_size, seq_len].

Return type

LongTensor, LongTensor

SuPar 1.1.4 documentation

Dependency Models¶

BiaffineDependencyModel¶

CRFDependencyModel¶

CRF2oDependencyModel¶

VIDependencyModel¶