Is NLTK's naive Bayes Classifier suitable for commercial applications?
I need to train a naive Bayes classifier on two corpuses consisting of approx. 15,000 tokens each. I'm using a basic bag of words feature extractor with binary labeling and I'm wondering if NLTK is powerful enough to handle all this data without significantly slowing down run time if such an application were to gain many users. The pro开发者_运维知识库gram would basically be classifying a regular stream of text messages from potentially thousands of users. Are there other machine learning packages you'd recommend integrating with NLTK if it isn't suitable?
Your corpora are not very big, so NLTK should do the job. However,I wouldn't recommend it in general, it is quite slow and buggy in places. Weka is a more powerful tool, but the fact that it can do so much more makes it harder to understand. If Naive Bayes is all you plan to use, it would probably be fastest to code it yourself.
EDIT (much later):
Try scikit-learn, it is very easy to use.
精彩评论