Semantic Dependency Models¶
BiaffineSemanticDependencyModel¶
- class supar.models.sdp.BiaffineSemanticDependencyModel(n_words, n_labels, n_tags=None, n_chars=None, n_lemmas=None, encoder='lstm', feat=['tag', 'char', 'lemma'], n_embed=100, n_pretrained=125, n_feat_embed=100, n_char_embed=50, n_char_hidden=400, char_pad_index=0, char_dropout=0.33, elmo='original_5b', elmo_bos_eos=(True, False), bert=None, n_bert_layers=4, mix_dropout=0.0, bert_pooling='mean', bert_pad_index=0, freeze=True, embed_dropout=0.2, n_lstm_hidden=600, n_lstm_layers=3, encoder_dropout=0.33, n_edge_mlp=600, n_label_mlp=600, edge_mlp_dropout=0.25, label_mlp_dropout=0.33, interpolation=0.1, pad_index=0, unk_index=1, **kwargs)[source]¶
The implementation of Biaffine Semantic Dependency Parser [Dozat & Manning 2018].
- Parameters
n_words (int) – The size of the word vocabulary.
n_labels (int) – The number of labels in the treebank.
n_tags (int) – The number of POS tags, required if POS tag embeddings are used. Default:
None.n_chars (int) – The number of characters, required if character-level representations are used. Default:
None.n_lemmas (int) – The number of lemmas, required if lemma embeddings are used. Default:
None.encoder (str) – Encoder to use.
'lstm': BiLSTM encoder.'bert': BERT-like pretrained language model (for finetuning), e.g.,'bert-base-cased'. Default:'lstm'.feat (list[str]) – Additional features to use, required if
encoder='lstm'.'tag': POS tag embeddings.'char': Character-level representations extracted by CharLSTM.'lemma': Lemma embeddings.'bert': BERT representations, other pretrained language models like RoBERTa are also feasible. Default: ['tag','char','lemma'].n_embed (int) – The size of word embeddings. Default: 100.
n_pretrained (int) – The size of pretrained word representations. Default: 125.
n_feat_embed (int) – The size of feature representations. Default: 100.
n_char_embed (int) – The size of character embeddings serving as inputs of CharLSTM, required if using CharLSTM. Default: 50.
n_char_hidden (int) – The size of hidden states of CharLSTM, required if using CharLSTM. Default: 100.
char_pad_index (int) – The index of the padding token in the character vocabulary, required if using CharLSTM. Default: 0.
elmo (str) – Name of the pretrained ELMo registered in ELMoEmbedding.OPTION. Default:
'original_5b'.elmo_bos_eos (tuple[bool]) – A tuple of two boolean values indicating whether to keep start/end boundaries of elmo outputs. Default:
(True, False).bert (str) – Specifies which kind of language model to use, e.g.,
'bert-base-cased'. This is required ifencoder='bert'or using BERT features. The full list can be found in transformers. Default:None.n_bert_layers (int) – Specifies how many last layers to use, required if
encoder='bert'or using BERT features. The final outputs would be weighted sum of the hidden states of these layers. Default: 4.mix_dropout (float) – The dropout ratio of BERT layers, required if
encoder='bert'or using BERT features. Default: .0.bert_pooling (str) – Pooling way to get token embeddings.
first: take the first subtoken.last: take the last subtoken.mean: take a mean over all. Default:mean.bert_pad_index (int) – The index of the padding token in BERT vocabulary, required if
encoder='bert'or using BERT features. Default: 0.freeze (bool) – If
True, freezes BERT parameters, required if using BERT features. Default:True.embed_dropout (float) – The dropout ratio of input embeddings. Default: .2.
n_lstm_hidden (int) – The size of LSTM hidden states. Default: 600.
n_lstm_layers (int) – The number of LSTM layers. Default: 3.
encoder_dropout (float) – The dropout ratio of encoder layer. Default: .33.
n_edge_mlp (int) – Edge MLP size. Default: 600.
n_label_mlp (int) – Label MLP size. Default: 600.
edge_mlp_dropout (float) – The dropout ratio of edge MLP layers. Default: .25.
label_mlp_dropout (float) – The dropout ratio of label MLP layers. Default: .33.
interpolation (int) – Constant to even out the label/edge loss. Default: .1.
pad_index (int) – The index of the padding token in the word vocabulary. Default: 0.
unk_index (int) – The index of the unknown token in the word vocabulary. Default: 1.
- forward(words, feats=None)[source]¶
- Parameters
words (LongTensor) –
[batch_size, seq_len]. Word indices.feats (list[LongTensor]) – A list of feat indices. The size is either
[batch_size, seq_len, fix_len]iffeatis'char'or'bert', or[batch_size, seq_len]otherwise. Default:None.
- Returns
The first tensor of shape
[batch_size, seq_len, seq_len, 2]holds scores of all possible edges. The second of shape[batch_size, seq_len, seq_len, n_labels]holds scores of all possible labels on each edge.- Return type
- loss(s_edge, s_label, labels, mask)[source]¶
- Parameters
s_edge (Tensor) –
[batch_size, seq_len, seq_len, 2]. Scores of all possible edges.s_label (Tensor) –
[batch_size, seq_len, seq_len, n_labels]. Scores of all possible labels on each edge.labels (LongTensor) –
[batch_size, seq_len, seq_len]. The tensor of gold-standard labels.mask (BoolTensor) –
[batch_size, seq_len]. The mask for covering the unpadded tokens.
- Returns
The training loss.
- Return type
VISemanticDependencyModel¶
- class supar.models.sdp.VISemanticDependencyModel(n_words, n_labels, n_tags=None, n_chars=None, n_lemmas=None, encoder='lstm', feat=['tag', 'char', 'lemma'], n_embed=100, n_pretrained=125, n_feat_embed=100, n_char_embed=50, n_char_hidden=100, char_pad_index=0, char_dropout=0, elmo='original_5b', elmo_bos_eos=(True, False), bert=None, n_bert_layers=4, mix_dropout=0.0, bert_pooling='mean', bert_pad_index=0, freeze=True, embed_dropout=0.2, n_lstm_hidden=600, n_lstm_layers=3, encoder_dropout=0.33, n_edge_mlp=600, n_pair_mlp=150, n_label_mlp=600, edge_mlp_dropout=0.25, pair_mlp_dropout=0.25, label_mlp_dropout=0.33, inference='mfvi', max_iter=3, interpolation=0.1, pad_index=0, unk_index=1, **kwargs)[source]¶
The implementation of Semantic Dependency Parser using Variational Inference [Wang et al. 2019].
- Parameters
n_words (int) – The size of the word vocabulary.
n_labels (int) – The number of labels in the treebank.
n_tags (int) – The number of POS tags, required if POS tag embeddings are used. Default:
None.n_chars (int) – The number of characters, required if character-level representations are used. Default:
None.n_lemmas (int) – The number of lemmas, required if lemma embeddings are used. Default:
None.encoder (str) – Encoder to use.
'lstm': BiLSTM encoder.'bert': BERT-like pretrained language model (for finetuning), e.g.,'bert-base-cased'. Default:'lstm'.feat (list[str]) – Additional features to use, required if
encoder='lstm'.'tag': POS tag embeddings.'char': Character-level representations extracted by CharLSTM.'lemma': Lemma embeddings.'bert': BERT representations, other pretrained language models like RoBERTa are also feasible. Default: ['tag','char','lemma'].n_embed (int) – The size of word embeddings. Default: 100.
n_pretrained (int) – The size of pretrained word embeddings. Default: 125.
n_feat_embed (int) – The size of feature representations. Default: 100.
n_char_embed (int) – The size of character embeddings serving as inputs of CharLSTM, required if using CharLSTM. Default: 50.
n_char_hidden (int) – The size of hidden states of CharLSTM, required if using CharLSTM. Default: 100.
char_pad_index (int) – The index of the padding token in the character vocabulary, required if using CharLSTM. Default: 0.
elmo (str) – Name of the pretrained ELMo registered in ELMoEmbedding.OPTION. Default:
'original_5b'.elmo_bos_eos (tuple[bool]) – A tuple of two boolean values indicating whether to keep start/end boundaries of elmo outputs. Default:
(True, False).bert (str) – Specifies which kind of language model to use, e.g.,
'bert-base-cased'. This is required ifencoder='bert'or using BERT features. The full list can be found in transformers. Default:None.n_bert_layers (int) – Specifies how many last layers to use, required if
encoder='bert'or using BERT features. The final outputs would be weighted sum of the hidden states of these layers. Default: 4.mix_dropout (float) – The dropout ratio of BERT layers, required if
encoder='bert'or using BERT features. Default: .0.bert_pooling (str) – Pooling way to get token embeddings.
first: take the first subtoken.last: take the last subtoken.mean: take a mean over all. Default:mean.bert_pad_index (int) – The index of the padding token in BERT vocabulary, required if
encoder='bert'or using BERT features. Default: 0.freeze (bool) – If
True, freezes BERT parameters, required if using BERT features. Default:True.embed_dropout (float) – The dropout ratio of input embeddings. Default: .2.
n_lstm_hidden (int) – The size of LSTM hidden states. Default: 600.
n_lstm_layers (int) – The number of LSTM layers. Default: 3.
encoder_dropout (float) – The dropout ratio of encoder layer. Default: .33.
n_edge_mlp (int) – Unary factor MLP size. Default: 600.
n_pair_mlp (int) – Binary factor MLP size. Default: 150.
n_label_mlp (int) – Label MLP size. Default: 600.
edge_mlp_dropout (float) – The dropout ratio of unary edge factor MLP layers. Default: .25.
pair_mlp_dropout (float) – The dropout ratio of binary factor MLP layers. Default: .25.
label_mlp_dropout (float) – The dropout ratio of label MLP layers. Default: .33.
inference (str) – Approximate inference methods. Default:
mfvi.max_iter (int) – Max iteration times for inference. Default: 3.
interpolation (int) – Constant to even out the label/edge loss. Default: .1.
pad_index (int) – The index of the padding token in the word vocabulary. Default: 0.
unk_index (int) – The index of the unknown token in the word vocabulary. Default: 1.
- forward(words, feats=None)[source]¶
- Parameters
words (LongTensor) –
[batch_size, seq_len]. Word indices.feats (list[LongTensor]) – A list of feat indices. The size is either
[batch_size, seq_len, fix_len]iffeatis'char'or'bert', or[batch_size, seq_len]otherwise. Default:None.
- Returns
The first and last are scores of all possible edges of shape
[batch_size, seq_len, seq_len]and possible labels on each edge of shape[batch_size, seq_len, seq_len, n_labels]. Others are scores of second-order sibling, coparent and grandparent factors ([batch_size, seq_len, seq_len, seq_len]).- Return type
- loss(s_edge, s_sib, s_cop, s_grd, s_label, labels, mask)[source]¶
- Parameters
s_edge (Tensor) –
[batch_size, seq_len, seq_len]. Scores of all possible edges.s_sib (Tensor) –
[batch_size, seq_len, seq_len, seq_len]. Scores of all possible dependent-head-sibling triples.s_cop (Tensor) –
[batch_size, seq_len, seq_len, seq_len]. Scores of all possible dependent-head-coparent triples.s_grd (Tensor) –
[batch_size, seq_len, seq_len, seq_len]. Scores of all possible dependent-head-grandparent triples.s_label (Tensor) –
[batch_size, seq_len, seq_len, n_labels]. Scores of all possible labels on each edge.labels (LongTensor) –
[batch_size, seq_len, seq_len]. The tensor of gold-standard labels.mask (BoolTensor) –
[batch_size, seq_len]. The mask for covering the unpadded tokens.
- Returns
The training loss and marginals of shape
[batch_size, seq_len, seq_len].- Return type