开发者

Trying to use MEGAM as an NLTK ClassifierBasedPOSTagger?

I am currently trying to build a general purpose (or as general as is practical) POS tagger with NLTK. I have dabbled with the brown and treebank corpora for training, but will probably be settling on the treebank corpus.

Learning as I go, I am finding the classifier POS taggers are the most accurate. The Maximum Entity classifier is meant to be the most accurate, but I find it uses so much memory (and processing time) that I have to significantly reduce the training dataset, so the end result is less accurate than using the default Naive Bayes classifier.

It has been suggested that I use MEGAM. NLTK has some support for MEGAM, but all the examples I have found are for general classifiers (eg. a text classifier that uses a vector of word features), rather than a more specific POS tagger. Without having to recreate my own POS feature extractor and compiler (ie. I prefer to use the one already in NLTK), how can I used the MEGAM MaxEnt classifier? Ie. how can I drop it in some existing MaxEnt code that is along the lines of:

maxent_tagger = ClassifierBasedPOSTagger(train=train开发者_运维百科ing_sentences,
                                        classifier_builder=MaxentClassifier.train )


This one liner should work for training a MEGAM MaxentClassifier for the ClassifierBasedPOSTagger. Of course, that assumes MEGAM is already installed (go here to download)

maxent_tagger = ClassifierBasedPOSTagger(train=train_sents, classifier_builder=lambda train_feats: MaxentClassifier.train(train_feats, algorithm='megam', max_iter=10, min_lldelta=0.1))


For the future users:

Megam is now available on MAC:

$brew tap homebrew/science
$brew install megam

If you dont have XQuartz, it might ask you to get that first. Here is the direct download link: http://xquartz.macosforge.org/downloads/SL/XQuartz-2.7.5_rc4.dmg

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜