开发者

Text classification, preprocessing included

Which is the best method for document classificati开发者_开发知识库on if time is not a factor, and we dont know how many classes there are?


In my (incomplete) knowledge, Hierarchical Agglomerative Clustering is the best approach if you don't know how many classes. All of the other clustering algorithms either require prior knowledge of the number of buckets or some sort of cross-validation or other experimentation to determine a sensible number of buckets.


A cross link: see how-do-i-determine-k-when-using-k-means-clustering on SO.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜