Dependency Parsers¶
BiaffineDependencyParser¶
- class supar.parsers.dep.BiaffineDependencyParser(*args, **kwargs)[source]¶
The implementation of Biaffine Dependency Parser [Dozat & Manning 2017].
- MODEL¶
- train(train, dev, test, buckets=32, batch_size=5000, update_steps=1, punct=False, tree=False, proj=False, partial=False, verbose=True, **kwargs)[source]¶
- Parameters
train/dev/test (list[list] or str) – Filenames of the train/dev/test datasets.
buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
update_steps (int) – Gradient accumulation steps. Default: 1.
punct (bool) – If
False, ignores the punctuation during evaluation. Default:False.tree (bool) – If
True, ensures to output well-formed trees. Default:False.proj (bool) – If
True, ensures to output projective trees. Default:False.partial (bool) –
Truedenotes the trees are partially annotated. Default:False.verbose (bool) – If
True, increases the output verbosity. Default:True.kwargs (dict) – A dict holding unconsumed arguments for updating training configs.
- evaluate(data, buckets=8, batch_size=5000, punct=False, tree=True, proj=False, partial=False, verbose=True, **kwargs)[source]¶
- Parameters
data (str) – The data for evaluation, both list of instances and filename are allowed.
buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
punct (bool) – If
False, ignores the punctuation during evaluation. Default:False.tree (bool) – If
True, ensures to output well-formed trees. Default:False.proj (bool) – If
True, ensures to output projective trees. Default:False.partial (bool) –
Truedenotes the trees are partially annotated. Default:False.verbose (bool) – If
True, increases the output verbosity. Default:True.kwargs (dict) – A dict holding unconsumed arguments for updating evaluation configs.
- Returns
The loss scalar and evaluation results.
- predict(data, pred=None, lang=None, buckets=8, batch_size=5000, prob=False, tree=True, proj=False, verbose=True, **kwargs)[source]¶
- Parameters
data (list[list] or str) – The data for prediction, both a list of instances and filename are allowed.
pred (str) – If specified, the predicted results will be saved to the file. Default:
None.lang (str) – Language code (e.g.,
en) or language name (e.g.,English) for the text to tokenize.Noneif tokenization is not required. Default:None.buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
prob (bool) – If
True, outputs the probabilities. Default:False.tree (bool) – If
True, ensures to output well-formed trees. Default:False.proj (bool) – If
True, ensures to output projective trees. Default:False.verbose (bool) – If
True, increases the output verbosity. Default:True.kwargs (dict) – A dict holding unconsumed arguments for updating prediction configs.
- Returns
A
Datasetobject that stores the predicted results.
- classmethod load(path, reload=False, src='github', **kwargs)[source]¶
Loads a parser with data fields and pretrained model parameters.
- Parameters
path (str) –
a string with the shortcut name of a pretrained model defined in
supar.MODELto load from cache or download, e.g.,'biaffine-dep-en'.a local path to a pretrained model, e.g.,
./<path>/model.
reload (bool) – Whether to discard the existing cache and force a fresh download. Default:
False.src (str) – Specifies where to download the model.
'github': github release page.'hlt': hlt homepage, only accessible from 9:00 to 18:00 (UTC+8). Default:'github'.kwargs (dict) – A dict holding unconsumed arguments for updating training configs and initializing the model.
Examples
>>> from supar import Parser >>> parser = Parser.load('biaffine-dep-en') >>> parser = Parser.load('./ptb.biaffine.dep.lstm.char')
- classmethod build(path, min_freq=2, fix_len=20, **kwargs)[source]¶
Build a brand-new Parser, including initialization of all data fields and model parameters.
- Parameters
path (str) – The path of the model to be saved.
min_freq (str) – The minimum frequency needed to include a token in the vocabulary. Required if taking words as encoder input. Default: 2.
fix_len (int) – The max length of all subword pieces. The excess part of each piece will be truncated. Required if using CharLSTM/BERT. Default: 20.
kwargs (dict) – A dict holding the unconsumed arguments.
CRFDependencyParser¶
- class supar.parsers.dep.CRFDependencyParser(*args, **kwargs)[source]¶
The implementation of first-order CRF Dependency Parser [Zhang et al. 2020a].
- MODEL¶
alias of
supar.models.dep.CRFDependencyModel
- train(train, dev, test, buckets=32, batch_size=5000, update_steps=1, punct=False, mbr=True, tree=False, proj=False, partial=False, verbose=True, **kwargs)[source]¶
- Parameters
train/dev/test (list[list] or str) – Filenames of the train/dev/test datasets.
buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
update_steps (int) – Gradient accumulation steps. Default: 1.
punct (bool) – If
False, ignores the punctuation during evaluation. Default:False.mbr (bool) – If
True, performs MBR decoding. Default:True.tree (bool) – If
True, ensures to output well-formed trees. Default:False.proj (bool) – If
True, ensures to output projective trees. Default:False.partial (bool) –
Truedenotes the trees are partially annotated. Default:False.verbose (bool) – If
True, increases the output verbosity. Default:True.kwargs (dict) – A dict holding unconsumed arguments for updating training configs.
- evaluate(data, buckets=8, batch_size=5000, punct=False, mbr=True, tree=True, proj=True, partial=False, verbose=True, **kwargs)[source]¶
- Parameters
data (str) – The data for evaluation, both list of instances and filename are allowed.
buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
punct (bool) – If
False, ignores the punctuation during evaluation. Default:False.mbr (bool) – If
True, performs MBR decoding. Default:True.tree (bool) – If
True, ensures to output well-formed trees. Default:False.proj (bool) – If
True, ensures to output projective trees. Default:False.partial (bool) –
Truedenotes the trees are partially annotated. Default:False.verbose (bool) – If
True, increases the output verbosity. Default:True.kwargs (dict) – A dict holding unconsumed arguments for updating evaluation configs.
- Returns
The loss scalar and evaluation results.
- predict(data, pred=None, lang=None, buckets=8, batch_size=5000, prob=False, mbr=True, tree=True, proj=True, verbose=True, **kwargs)[source]¶
- Parameters
data (list[list] or str) – The data for prediction, both a list of instances and filename are allowed.
pred (str) – If specified, the predicted results will be saved to the file. Default:
None.lang (str) – Language code (e.g.,
en) or language name (e.g.,English) for the text to tokenize.Noneif tokenization is not required. Default:None.buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
prob (bool) – If
True, outputs the probabilities. Default:False.mbr (bool) – If
True, performs MBR decoding. Default:True.tree (bool) – If
True, ensures to output well-formed trees. Default:False.proj (bool) – If
True, ensures to output projective trees. Default:False.verbose (bool) – If
True, increases the output verbosity. Default:True.kwargs (dict) – A dict holding unconsumed arguments for updating prediction configs.
- Returns
A
Datasetobject that stores the predicted results.
- classmethod load(path, reload=False, src='github', **kwargs)[source]¶
Loads a parser with data fields and pretrained model parameters.
- Parameters
path (str) –
a string with the shortcut name of a pretrained model defined in
supar.MODELto load from cache or download, e.g.,'crf-dep-en'.a local path to a pretrained model, e.g.,
./<path>/model.
reload (bool) – Whether to discard the existing cache and force a fresh download. Default:
False.src (str) – Specifies where to download the model.
'github': github release page.'hlt': hlt homepage, only accessible from 9:00 to 18:00 (UTC+8). Default:'github'.kwargs (dict) – A dict holding unconsumed arguments for updating training configs and initializing the model.
Examples
>>> from supar import Parser >>> parser = Parser.load('crf-dep-en') >>> parser = Parser.load('./ptb.crf.dep.lstm.char')
CRF2oDependencyParser¶
- class supar.parsers.dep.CRF2oDependencyParser(*args, **kwargs)[source]¶
The implementation of second-order CRF Dependency Parser [Zhang et al. 2020a].
- MODEL¶
- train(train, dev, test, buckets=32, batch_size=5000, update_steps=1, punct=False, mbr=True, tree=False, proj=False, partial=False, verbose=True, **kwargs)[source]¶
- Parameters
train/dev/test (list[list] or str) – Filenames of the train/dev/test datasets.
buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
update_steps (int) – Gradient accumulation steps. Default: 1.
punct (bool) – If
False, ignores the punctuation during evaluation. Default:False.mbr (bool) – If
True, performs MBR decoding. Default:True.tree (bool) – If
True, ensures to output well-formed trees. Default:False.proj (bool) – If
True, ensures to output projective trees. Default:False.partial (bool) –
Truedenotes the trees are partially annotated. Default:False.verbose (bool) – If
True, increases the output verbosity. Default:True.kwargs (dict) – A dict holding unconsumed arguments for updating training configs.
- evaluate(data, buckets=8, batch_size=5000, punct=False, mbr=True, tree=True, proj=True, partial=False, verbose=True, **kwargs)[source]¶
- Parameters
data (str) – The data for evaluation, both list of instances and filename are allowed.
buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
punct (bool) – If
False, ignores the punctuation during evaluation. Default:False.mbr (bool) – If
True, performs MBR decoding. Default:True.tree (bool) – If
True, ensures to output well-formed trees. Default:False.proj (bool) – If
True, ensures to output projective trees. Default:False.partial (bool) –
Truedenotes the trees are partially annotated. Default:False.verbose (bool) – If
True, increases the output verbosity. Default:True.kwargs (dict) – A dict holding unconsumed arguments for updating evaluation configs.
- Returns
The loss scalar and evaluation results.
- predict(data, pred=None, lang=None, buckets=8, batch_size=5000, prob=False, mbr=True, tree=True, proj=True, verbose=True, **kwargs)[source]¶
- Parameters
data (list[list] or str) – The data for prediction, both a list of instances and filename are allowed.
pred (str) – If specified, the predicted results will be saved to the file. Default:
None.lang (str) – Language code (e.g.,
en) or language name (e.g.,English) for the text to tokenize.Noneif tokenization is not required. Default:None.buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
prob (bool) – If
True, outputs the probabilities. Default:False.mbr (bool) – If
True, performs MBR decoding. Default:True.tree (bool) – If
True, ensures to output well-formed trees. Default:False.proj (bool) – If
True, ensures to output projective trees. Default:False.verbose (bool) – If
True, increases the output verbosity. Default:True.kwargs (dict) – A dict holding unconsumed arguments for updating prediction configs.
- Returns
A
Datasetobject that stores the predicted results.
- classmethod load(path, reload=False, src='github', **kwargs)[source]¶
Loads a parser with data fields and pretrained model parameters.
- Parameters
path (str) –
a string with the shortcut name of a pretrained model defined in
supar.MODELto load from cache or download, e.g.,'crf2o-dep-en'.a local path to a pretrained model, e.g.,
./<path>/model.
reload (bool) – Whether to discard the existing cache and force a fresh download. Default:
False.src (str) – Specifies where to download the model.
'github': github release page.'hlt': hlt homepage, only accessible from 9:00 to 18:00 (UTC+8). Default:'github'.kwargs (dict) – A dict holding unconsumed arguments for updating training configs and initializing the model.
Examples
>>> from supar import Parser >>> parser = Parser.load('crf2o-dep-en') >>> parser = Parser.load('./ptb.crf2o.dep.lstm.char')
- classmethod build(path, min_freq=2, fix_len=20, **kwargs)[source]¶
Build a brand-new Parser, including initialization of all data fields and model parameters.
- Parameters
path (str) – The path of the model to be saved.
min_freq (str) – The minimum frequency needed to include a token in the vocabulary. Default: 2.
fix_len (int) – The max length of all subword pieces. The excess part of each piece will be truncated. Required if using CharLSTM/BERT. Default: 20.
kwargs (dict) – A dict holding the unconsumed arguments.
VIDependencyParser¶
- class supar.parsers.dep.VIDependencyParser(*args, **kwargs)[source]¶
The implementation of Dependency Parser using Variational Inference [Wang & Tu 2020].
- MODEL¶
alias of
supar.models.dep.VIDependencyModel
- train(train, dev, test, buckets=32, batch_size=5000, update_steps=1, punct=False, tree=False, proj=False, partial=False, verbose=True, **kwargs)[source]¶
- Parameters
train/dev/test (list[list] or str) – Filenames of the train/dev/test datasets.
buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
update_steps (int) – Gradient accumulation steps. Default: 1.
punct (bool) – If
False, ignores the punctuation during evaluation. Default:False.tree (bool) – If
True, ensures to output well-formed trees. Default:False.proj (bool) – If
True, ensures to output projective trees. Default:False.partial (bool) –
Truedenotes the trees are partially annotated. Default:False.verbose (bool) – If
True, increases the output verbosity. Default:True.kwargs (dict) – A dict holding unconsumed arguments for updating training configs.
- evaluate(data, buckets=8, batch_size=5000, punct=False, tree=True, proj=True, partial=False, verbose=True, **kwargs)[source]¶
- Parameters
data (str) – The data for evaluation, both list of instances and filename are allowed.
buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
punct (bool) – If
False, ignores the punctuation during evaluation. Default:False.tree (bool) – If
True, ensures to output well-formed trees. Default:False.proj (bool) – If
True, ensures to output projective trees. Default:False.partial (bool) –
Truedenotes the trees are partially annotated. Default:False.verbose (bool) – If
True, increases the output verbosity. Default:True.kwargs (dict) – A dict holding unconsumed arguments for updating evaluation configs.
- Returns
The loss scalar and evaluation results.
- predict(data, pred=None, lang=None, buckets=8, batch_size=5000, prob=False, tree=True, proj=True, verbose=True, **kwargs)[source]¶
- Parameters
data (list[list] or str) – The data for prediction, both a list of instances and filename are allowed.
pred (str) – If specified, the predicted results will be saved to the file. Default:
None.lang (str) – Language code (e.g.,
en) or language name (e.g.,English) for the text to tokenize.Noneif tokenization is not required. Default:None.buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
prob (bool) – If
True, outputs the probabilities. Default:False.tree (bool) – If
True, ensures to output well-formed trees. Default:False.proj (bool) – If
True, ensures to output projective trees. Default:False.verbose (bool) – If
True, increases the output verbosity. Default:True.kwargs (dict) – A dict holding unconsumed arguments for updating prediction configs.
- Returns
A
Datasetobject that stores the predicted results.
- classmethod load(path, reload=False, src='github', **kwargs)[source]¶
Loads a parser with data fields and pretrained model parameters.
- Parameters
path (str) –
a string with the shortcut name of a pretrained model defined in
supar.MODELto load from cache or download, e.g.,'vi-dep-en'.a local path to a pretrained model, e.g.,
./<path>/model.
reload (bool) – Whether to discard the existing cache and force a fresh download. Default:
False.src (str) – Specifies where to download the model.
'github': github release page.'hlt': hlt homepage, only accessible from 9:00 to 18:00 (UTC+8). Default:'github'.kwargs (dict) – A dict holding unconsumed arguments for updating training configs and initializing the model.
Examples
>>> from supar import Parser >>> parser = Parser.load('vi-dep-en') >>> parser = Parser.load('./ptb.vi.dep.lstm.char')