WEKA - Classifying New Data from Java - IDF Transform
We are trying to implement a WEKA classifier from inside a Java program. So far so good, everything works well however when building 开发者_C百科the classifier from the training set in Weka GUI we used the StringToWordVector IDF transform to help improve classification accuracy.
How, from within Java for new instances do I calculate the IDF transform to set for each token value in the new instance before passing the instance to the classifier?
The basic code looks like this:
Instances ins = vectorize(msg);
Instances unlabeled = new Instances(train,1);
Instance inst = new Instance(unlabeled.numAttributes());
String tmp = "";
for(int i=0; i < ins.numAttributes(); i++) {
tmp = ins.attribute(i).name();
if(unlabeled.attribute(tmp)!=null)
inst.setValue(unlabeled.attribute(tmp), 1.0); //TODO: Need to figure out the IDF transformed value to put here NOT 1!!
}
unlabeled.add(inst);
unlabeled.setClassIndex(classIdx);
.....cl.distributionForInstance(unlabeled.instance(i));
So how do I go about coding this so that I put the correct value in the new instance I want to classify?
Just to be clear the line inst.setValue(unlabeled.attribute(tmp), 1.0);
needs to be changed from 1.0
to the IDF transformed number...
You need to use FilteredClassifier for this purpose. The code snippet is :
StringToWordVector strWVector = new StringToWordVector();
filteredClassifier fcls = new FilteredClassifier();
fcls.setFilter(strWVector);
fcls.setClassifier(new SMO());
fcls.buildClassifier(yourdata)
//rest of your code
This is much easier as you can pass your instances all at once.FilteredClassifier takes care of all other details. The code is not tested but it will get you started.
Edit : You can do in the following way too. This is code snippet from weka tutorial See http://weka.wikispaces.com/Use+WEKA+in+your+Java+code#Filter-Filtering%20on-the-fly Batch Mode for details
Instances train = ... // from somewhere
Instances test = ... // from somewhere
Standardize filter = new Standardize();
filter.setInputFormat(train); // initializing the filter once with training set
Instances newTrain = Filter.useFilter(train, filter); // configures the Filter based on train instances and returns filtered instances
Instances newTest = Filter.useFilter(test, filter); // create new test se
HTH
精彩评论