stanford tagger - tagging speed
regarding the stanford tagger, I've provided my own labelled corpus for training the model for the stanford tagger. However, I've realised that the tagging speed of my model for the tagger is much less slower than the default wsjleft3 t开发者_Python百科agger model. What might contribute to this? And how do I improve the speed of my model? (I've added 3 or 4 custom tags in addition to the Penn treebank tagsets)
While adding more features (in arch) makes it a bit slower in general (as feature extraction is one of the main runtime costs), the two big determinants of speed are:
- Number of context tags used in features: left3words uses the previous and second previous tag (2) and so is fairly fast, bidirectional uses 4 (two on each side) and so is very slow. A tagger that uses just 1 or 0 context tags is much faster again.
- Size of the tag set in general, and in particular the size of the set of open class tags that can be applied to unknown words. (But adding 3 or 4 should make almost no difference -- it's problematic when you have a tag set with hundreds of tags.)
精彩评论