开发者

interpreting Naive Bayes results

I start using NaiveBayes/Simple classifier for classification (Weka), however I have some problems to understand while training the data. The data set I'm using is weather.nominal.arff.

interpreting Naive Bayes results

While I use use training test from the options, the classifier result is:

Correctly Classified Instances 13  -  92.8571 %    
Incorrectly Classified Instances 1 - 7.1429 %   

a b classified as  
9 0  a =yes
1 4  b = no

My first question what should I understand from the incorrect classified instances? Why such a problem occurred? which attribute collection is classified incorrect? is ther开发者_开发技巧e a way to understand this?

Secondly, when I try the 10 fold cross validation, why I get different (less) correctly classified instances?

The results are:

Correctly Classified Instances           8               57.1429 %
Incorrectly Classified Instances         6               42.8571 %

 a b   <-- classified as
 7 2 | a = yes
 4 1 | b = no


You can get the individual predictions for each instance by choosing this option from:

More Options... > Output predictions > PlainText

Which will give you in addition to the evaluation metrics, the following:

=== Predictions on training set ===

 inst#     actual  predicted error prediction
     1       2:no       2:no       0.704 
     2       2:no       2:no       0.847 
     3      1:yes      1:yes       0.737 
     4      1:yes      1:yes       0.554 
     5      1:yes      1:yes       0.867 
     6       2:no      1:yes   +   0.737 
     7      1:yes      1:yes       0.913 
     8       2:no       2:no       0.588 
     9      1:yes      1:yes       0.786 
    10      1:yes      1:yes       0.845 
    11      1:yes      1:yes       0.568 
    12      1:yes      1:yes       0.667 
    13      1:yes      1:yes       0.925 
    14       2:no       2:no       0.652 

which indicates that the 6th instances was misclassified. Note that even if you train and test on the same instances, misclassifications can occur due to inconsistencies in the data (the simplest example is having two instances with the same features but with different class label).

Keep in mind that the above way of testing is biased (its somewhat cheating since it can see the answers to the questions). Thus we are usually interested in getting a more realistic estimate of the model error on unseen data. Cross-validation is one such technique, where it partition the data into 10 stratified folds, performing the testing on one fold, while training on the other nine, finally it reports the average accuracy across the ten runs.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜