Text classification, preprocessing included
Which is the best method for document classificati开发者_开发知识库on if time is not a factor, and we dont know how many classes there are?
In my (incomplete) knowledge, Hierarchical Agglomerative Clustering is the best approach if you don't know how many classes. All of the other clustering algorithms either require prior knowledge of the number of buckets or some sort of cross-validation or other experimentation to determine a sensible number of buckets.
A cross link: see how-do-i-determine-k-when-using-k-means-clustering on SO.
精彩评论