Trying to use MEGAM as an NLTK ClassifierBasedPOSTagger?

2023-01-30 20:31 问答作者：

I am currently trying to build a general purpose (or as general as is practical) POS tagger with NLTK. I have dabbled with the brown and treebank corpora for training, but will probably be settling on the treebank corpus.

Learning as I go, I am finding the classifier POS taggers are the most accurate. The Maximum Entity classifier is meant to be the most accurate, but I find it uses so much memory (and processing time) that I have to significantly reduce the training dataset, so the end result is less accurate than using the default Naive Bayes classifier.

It has been suggested that I use MEGAM. NLTK has some support for MEGAM, but all the examples I have found are for general classifiers (eg. a text classifier that uses a vector of word features), rather than a more specific POS tagger. Without having to recreate my own POS feature extractor and compiler (ie. I prefer to use the one already in NLTK), how can I used the MEGAM MaxEnt classifier? Ie. how can I drop it in some existing MaxEnt code that is along the lines of:

maxent_tagger = ClassifierBasedPOSTagger(train=train开发者_运维百科ing_sentences,
                                        classifier_builder=MaxentClassifier.train )

This one liner should work for training a MEGAM MaxentClassifier for the ClassifierBasedPOSTagger. Of course, that assumes MEGAM is already installed (go here to download)

maxent_tagger = ClassifierBasedPOSTagger(train=train_sents, classifier_builder=lambda train_feats: MaxentClassifier.train(train_feats, algorithm='megam', max_iter=10, min_lldelta=0.1))

For the future users:

Megam is now available on MAC:

$brew tap homebrew/science
$brew install megam

If you dont have XQuartz, it might ask you to get that first. Here is the direct download link: http://xquartz.macosforge.org/downloads/SL/XQuartz-2.7.5_rc4.dmg

继续阅读：nltk pos-tagger python

Trying to use MEGAM as an NLTK ClassifierBasedPOSTagger?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？