Vocab

Vocab

class supar.utils.vocab.Vocab(counter, min_freq=1, specials=[], unk_index=0)[source]

Defines a vocabulary object that will be used to numericalize a field.

Parameters
  • counter (Counter) – Counter object holding the frequencies of each value found in the data.

  • min_freq (int) – The minimum frequency needed to include a token in the vocabulary. Default: 1.

  • specials (list[str]) – The list of special tokens (e.g., pad, unk, bos and eos) that will be prepended to the vocabulary. Default: [].

  • unk_index (int) – The index of unk token. Default: 0.

itos

A list of token strings indexed by their numerical identifiers.

stoi

A defaultdict object mapping token strings to numerical identifiers.