Constituency Parsers¶

CRFConstituencyParser¶

class supar.parsers.const.CRFConstituencyParser(*args, **kwargs)[source]¶

The implementation of CRF Constituency Parser [Zhang et al. 2020b].

MODEL¶: alias of supar.models.const.CRFConstituencyModel

train(train, dev, test, buckets=32, batch_size=5000, update_steps=1, mbr=True, delete={'', '!', "''", ',', '-NONE-', '.', ':', '?', 'S1', 'TOP', '``'}, equal={'ADVP': 'PRT'}, verbose=True, **kwargs)[source]¶

Parameters

train/dev/test (list[list] or str) – Filenames of the train/dev/test datasets.
buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
update_steps (int) – Gradient accumulation steps. Default: 1.
mbr (bool) – If True, performs MBR decoding. Default: True.
delete (set[str]) – A set of labels that will not be taken into consideration during evaluation. Default: {‘TOP’, ‘S1’, ‘-NONE-‘, ‘,’, ‘:’, ‘``’, “’’”, ‘.’, ‘?’, ‘!’, ‘’}.
equal (dict[str, str]) – The pairs in the dict are considered equivalent during evaluation. Default: {‘ADVP’: ‘PRT’}.
verbose (bool) – If True, increases the output verbosity. Default: True.
kwargs (dict) – A dict holding unconsumed arguments for updating training configs.

evaluate(data, buckets=8, batch_size=5000, mbr=True, delete={'', '!', "''", ',', '-NONE-', '.', ':', '?', 'S1', 'TOP', '``'}, equal={'ADVP': 'PRT'}, verbose=True, **kwargs)[source]¶

Parameters

data (str) – The data for evaluation, both list of instances and filename are allowed.
buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
mbr (bool) – If True, performs MBR decoding. Default: True.
delete (set[str]) – A set of labels that will not be taken into consideration during evaluation. Default: {‘TOP’, ‘S1’, ‘-NONE-‘, ‘,’, ‘:’, ‘``’, “’’”, ‘.’, ‘?’, ‘!’, ‘’}.
equal (dict[str, str]) – The pairs in the dict are considered equivalent during evaluation. Default: {‘ADVP’: ‘PRT’}.
verbose (bool) – If True, increases the output verbosity. Default: True.
kwargs (dict) – A dict holding unconsumed arguments for updating evaluation configs.

Returns

The loss scalar and evaluation results.

predict(data, pred=None, lang=None, buckets=8, batch_size=5000, prob=False, mbr=True, verbose=True, **kwargs)[source]¶

Parameters

data (list[list] or str) – The data for prediction, both a list of instances and filename are allowed.
pred (str) – If specified, the predicted results will be saved to the file. Default: None.
lang (str) – Language code (e.g., en) or language name (e.g., English) for the text to tokenize. None if tokenization is not required. Default: None.
buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
prob (bool) – If True, outputs the probabilities. Default: False.
mbr (bool) – If True, performs MBR decoding. Default: True.
verbose (bool) – If True, increases the output verbosity. Default: True.
kwargs (dict) – A dict holding unconsumed arguments for updating prediction configs.

Returns

A Dataset object that stores the predicted results.

classmethod load(path, reload=False, src='github', **kwargs)[source]¶

Loads a parser with data fields and pretrained model parameters.

Parameters

path (str) –
- a string with the shortcut name of a pretrained model defined in supar.MODEL to load from cache or download, e.g., 'crf-con-en'.
- a local path to a pretrained model, e.g., ./<path>/model.
reload (bool) – Whether to discard the existing cache and force a fresh download. Default: False.
src (str) – Specifies where to download the model. 'github': github release page. 'hlt': hlt homepage, only accessible from 9:00 to 18:00 (UTC+8). Default: 'github'.
kwargs (dict) – A dict holding unconsumed arguments for updating training configs and initializing the model.

Examples

>>> from supar import Parser
>>> parser = Parser.load('crf-con-en')
>>> parser = Parser.load('./ptb.crf.con.lstm.char')

classmethod build(path, min_freq=2, fix_len=20, **kwargs)[source]¶

Build a brand-new Parser, including initialization of all data fields and model parameters.

Parameters

path (str) – The path of the model to be saved.
min_freq (str) – The minimum frequency needed to include a token in the vocabulary. Default: 2.
fix_len (int) – The max length of all subword pieces. The excess part of each piece will be truncated. Required if using CharLSTM/BERT. Default: 20.
kwargs (dict) – A dict holding the unconsumed arguments.

VIConstituencyParser¶

class supar.parsers.const.VIConstituencyParser(*args, **kwargs)[source]¶

The implementation of Constituency Parser using variational inference.

MODEL¶: alias of supar.models.const.VIConstituencyModel

train(train, dev, test, buckets=32, batch_size=5000, update_steps=1, delete={'', '!', "''", ',', '-NONE-', '.', ':', '?', 'S1', 'TOP', '``'}, equal={'ADVP': 'PRT'}, verbose=True, **kwargs)[source]¶

Parameters

train/dev/test (list[list] or str) – Filenames of the train/dev/test datasets.
buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
update_steps (int) – Gradient accumulation steps. Default: 1.
delete (set[str]) – A set of labels that will not be taken into consideration during evaluation. Default: {‘TOP’, ‘S1’, ‘-NONE-‘, ‘,’, ‘:’, ‘``’, “’’”, ‘.’, ‘?’, ‘!’, ‘’}.
equal (dict[str, str]) – The pairs in the dict are considered equivalent during evaluation. Default: {‘ADVP’: ‘PRT’}.
verbose (bool) – If True, increases the output verbosity. Default: True.
kwargs (dict) – A dict holding unconsumed arguments for updating training configs.

evaluate(data, buckets=8, batch_size=5000, delete={'', '!', "''", ',', '-NONE-', '.', ':', '?', 'S1', 'TOP', '``'}, equal={'ADVP': 'PRT'}, verbose=True, **kwargs)[source]¶

Parameters

data (str) – The data for evaluation, both list of instances and filename are allowed.
buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
delete (set[str]) – A set of labels that will not be taken into consideration during evaluation. Default: {‘TOP’, ‘S1’, ‘-NONE-‘, ‘,’, ‘:’, ‘``’, “’’”, ‘.’, ‘?’, ‘!’, ‘’}.
equal (dict[str, str]) – The pairs in the dict are considered equivalent during evaluation. Default: {‘ADVP’: ‘PRT’}.
verbose (bool) – If True, increases the output verbosity. Default: True.
kwargs (dict) – A dict holding unconsumed arguments for updating evaluation configs.

Returns

The loss scalar and evaluation results.

predict(data, pred=None, lang=None, buckets=8, batch_size=5000, prob=False, verbose=True, **kwargs)[source]¶

Parameters

data (list[list] or str) – The data for prediction, both a list of instances and filename are allowed.
pred (str) – If specified, the predicted results will be saved to the file. Default: None.
lang (str) – Language code (e.g., en) or language name (e.g., English) for the text to tokenize. None if tokenization is not required. Default: None.
buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
prob (bool) – If True, outputs the probabilities. Default: False.
mbr (bool) – If True, performs MBR decoding. Default: True.
verbose (bool) – If True, increases the output verbosity. Default: True.
kwargs (dict) – A dict holding unconsumed arguments for updating prediction configs.

Returns

A Dataset object that stores the predicted results.

classmethod load(path, reload=False, src='github', **kwargs)[source]¶

Loads a parser with data fields and pretrained model parameters.

Parameters

path (str) –
- a string with the shortcut name of a pretrained model defined in supar.MODEL to load from cache or download, e.g., 'vi-con-en'.
- a local path to a pretrained model, e.g., ./<path>/model.
reload (bool) – Whether to discard the existing cache and force a fresh download. Default: False.
src (str) – Specifies where to download the model. 'github': github release page. 'hlt': hlt homepage, only accessible from 9:00 to 18:00 (UTC+8). Default: 'github'.
kwargs (dict) – A dict holding unconsumed arguments for updating training configs and initializing the model.

Examples

>>> from supar import Parser
>>> parser = Parser.load('vi-con-en')
>>> parser = Parser.load('./ptb.vi.con.lstm.char')

SuPar 1.1.4 documentation

Constituency Parsers¶

CRFConstituencyParser¶

VIConstituencyParser¶