Embedding Layers¶

TransformerEmbedding¶

class supar.modules.pretrained.TransformerEmbedding(model, n_layers, n_out=0, stride=256, pooling='mean', pad_index=0, dropout=0, requires_grad=False)[source]¶

Bidirectional transformer embeddings of words from various transformer architectures [Devlin et al. 2019].

Parameters

model (str) – Path or name of the pretrained models registered in transformers, e.g., 'bert-base-cased'.
n_layers (int) – The number of BERT layers to use. If 0, uses all layers.
n_out (int) – The requested size of the embeddings. If 0, uses the size of the pretrained embedding model. Default: 0.
stride (int) – A sequence longer than max length will be splitted into several small pieces with a window size of stride. Default: 10.
pooling (str) – Pooling way to get from token piece embeddings to token embedding. first: take the first subtoken. last: take the last subtoken. mean: take a mean over all. Default: mean.
pad_index (int) – The index of the padding token in BERT vocabulary. Default: 0.
dropout (float) – The dropout ratio of BERT layers. Default: 0. This value will be passed into the ScalarMix layer.
requires_grad (bool) – If True, the model parameters will be updated together with the downstream task. Default: False.

forward(subwords)[source]¶

Parameters: subwords (Tensor) – [batch_size, seq_len, fix_len].
Returns: BERT embeddings of shape [batch_size, seq_len, n_out].
Return type: Tensor

ELMoEmbedding¶

class supar.modules.pretrained.ELMoEmbedding(model='original_5b', bos_eos=(True, True), n_out=0, dropout=0.5, requires_grad=False)[source]¶

Contextual word embeddings using word-level bidirectional LM [Peters et al. 2018].

Parameters

model (str) – The name of the pretrained ELMo registered in OPTION and WEIGHT. Default: 'original_5b'.
bos_eos (tuple[bool]) – A tuple of two boolean values indicating whether to keep start/end boundaries of sentence outputs. Default: (True, True).
n_out (int) – The requested size of the embeddings. If 0, uses the default size of ELMo outputs. Default: 0.
dropout (float) – The dropout ratio for the ELMo layer. Default: 0.
requires_grad (bool) – If True, the model parameters will be updated together with the downstream task. Default: False.

forward(chars)[source]¶

Parameters: chars (Tensor) – [batch_size, seq_len, fix_len].
Returns: ELMo embeddings of shape [batch_size, seq_len, n_out].
Return type: Tensor

ScalarMix¶

class supar.modules.pretrained.ScalarMix(n_layers, dropout=0)[source]¶

Computes a parameterized scalar mixture of \(N\) tensors, \(mixture = \gamma * \sum_{k}(s_k * tensor_k)\) where \(s = \mathrm{softmax}(w)\), with \(w\) and \(\gamma\) scalar parameters.

Parameters

n_layers (int) – The number of layers to be mixed, i.e., \(N\).
dropout (float) – The dropout ratio of the layer weights. If dropout > 0, then for each scalar weight, adjusts its softmax weight mass to 0 with the dropout probability (i.e., setting the unnormalized weight to -inf). This effectively redistributes the dropped probability mass to all other weights. Default: 0.

forward(tensors)[source]¶

Parameters: tensors (list[Tensor]) – \(N\) tensors to be mixed.
Returns: The mixture of \(N\) tensors.

SuPar 1.1.4 documentation

Embedding Layers¶

TransformerEmbedding¶

ELMoEmbedding¶

ScalarMix¶