开发者

Building/Running a Streaming Weka Text Classifer in Java

We have been using the Weka Explorer GUI to build a few classifier models. Now Testing is complete we would like to implement this model within a Java application so it can take new messages.

So for new messages we need to tokenize the message, match up tokens in the message with tokens used to build the word vector for the model and then parse this word vector to the model.

How should we go about this process? Are there any examples available?

How do we deal with new tokens (i.e. words that appear in new text messages which are not a part of the word vector used to build the model)?

For the classifier preprocessing/tokenising we are using the NGram Tokenizer, Stemmer and IDF Transform. So we 开发者_如何学运维need to figure out how to do these steps before we can create a new instancebased on the text we would like to classify.

As a side When building a classifier in the explorer, under more options there is a button to choose 'output classifier code' which sounds like it outputs Java source code to build and use the model however this option is disabled. Tested with a number of different classifiers (RF, NB) and it doesnt change. I'm guessing its not implemented for these?

Cheers!


To my best knowledge you need to retrain weka classifier when a new training sample arrives. I am not aware of an online classification algorithm in Wekka.

ps. Weka is Java based, so you can use its libs in your application. Here is a good example: http://weka.wikispaces.com/Use+WEKA+in+your+Java+code.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜