wondering if Bayes classifier is right approach?
I'm wondering if a Bayes classifier makes sense for an application where the same phrase "served cold" (for example) is "good" when associated some things (beer, soda) but "bad" when related to other things (steak, pizza, burger)?
What I'm wondering is if training a Bayes classifier that ("beer cold" and "soda cold" are "good") cancels out training it that "steak served cold" and "burger served cold" are "bad").
Or, can Bayes (correctly) be trained that "served cold" might be "good" or "bad" depending on what it is associated with?
I found a lot of good info on Bayes, here and elsewhere, but was unable to determine if it's suitable for this type of application where the answer to a phrase being good 开发者_Go百科or bad is "it depends"?
A Naive Bayes classifier assumes independence between attributes. For example, assume you have the following data:
apple fruit red BAD
apple fruit green BAD
banana fruit yellow GOOD
tomato vegetable red GOOD
Independence means that the attributes (name, fruit, color) are independent; for example, that "apple" could be either "fruit" or "vegetable". In this case the attributes "name" and "fruit" are dependent so a Naive Bayes classifier is too naive (it would likely classify "apple fruit yellow" as BAD because it's an apple AND it's a fruit -- but aren't all apples fruits?).
To answer your original question, a Naive Bayes classifer assumes that class (GOOD or BAD) depends upon each attribute independently, which isn't the case -- I like my pizza hot and my soda cold.
EDIT: If you're looking for a classifier that has some utility but in theory could have numerous Type I and Type II errors, Naive Bayes is such a classifier. Naive Bayes is better than nothing, but there's measurable value in using a less naive classifier.
I wouldn't dismiss Bayes as fast as Daniel suggested. The quality (performance in math-speak) of Bayes depends on amount and quality of training data above all, and on the assumptions you make when you develop your algorithm.
To give you a short example, if you feed into it only {'beer cold' => :good, 'pizza cold' => :bad} the word 'cold' won't actually affect classification. It will just decide that all beers are good and all pizzas are bad (see how smart it is? :))
Anyway, the answer is too short to explain this in detail, I would recommend reading Paul Graham's essay on how he developed his spam filter - note that he made his own algorithm based on Bayes and not just off-the-shelf classifier. In my (so far short) experience it seems that you are better off following him in developing specific version of algorithm for specific problem at hand so you have control over various domain specific assumptions.
You can follow my attempts (in ruby) here if you are interested: http://arubyguy.com/2011/03/03/bayes-classification-update/
精彩评论