开发者

state-of-the-art of classification algorithms

We know there are like a thousand of classifiers, recently I was told that, some people say adaboost is like the out of the shell one.


First, adaboost is a meta-algorithm which is used in conjunction with (on top of) your favorite classifier. Second, classifiers which work well in one problem domain often don't work well in another. See the No Free Lunch wikipedia page. So, there is not going to be AN answer to your question. Still, it might be interesting to know what people are using in practice.


Weka and Mahout aren't algorithms... they're machine learning libraries. They include implementations of a wide range of algorithms. So, your best bet is to pick a library and try a few different algorithms to see which one works best for your particular problem (where "works best" is going to be a function of training cost, classification cost, and classification accuracy).

If it were me, I'd start with naive Bayes, k-nearest neighbors, and support vector machines. They represent well-established, well-understood methods with very different tradeoffs. Naive Bayes is cheap, but not especially accurate. K-NN is cheap during training but (can be) expensive during classification, and while it's usually very accurate it can be susceptible to overtraining. SVMs are expensive to train and have lots of meta-parameters to tweak, but they are cheap to apply and generally at least as accurate as k-NN.

If you tell us more about the problem you're trying to solve, we may be able to give more focused advice. But if you're just looking for the One True Algorithm, there isn't one -- the No Free Lunch theorem guarantees that.


Apache Mahout (open source, java) seems to pick up a lot of steam.


Weka is a very popular and stable Machine Learning library. It has been around for quite a while and written in Java.


Hastie et al. (2013, The Elements of Statistical Learning) conclude that the Gradient Boosting Machine is the best "off-the-shelf" Method. Independent of the Problem you have. Definition (see page 352): An “off-the-shelf” method is one that can be directly applied to the data without requiring a great deal of timeconsuming data preprocessing or careful tuning of the learning procedure.

And a bit older meaning: In fact, Breiman (NIPS Workshop, 1996) referred to AdaBoost with trees as the “best off-the-shelf classifier in the world” (see also Breiman (1998)).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜