开发者

Seed selection strategies for K-means

I wonder what kind of seed selection methods I can apply to K-means 开发者_如何学编程algorithm. Google search wasn't that helpful. Any suggestions?


The seeds depend on the domain. For example, if your data items are words, your seeds should be the most frequent words. Otherwise, you could cluster a small sample and use that as a seed.

Here is an example of a more sophisticated algorithm:

Single Pass Seed Selection Algorithm for k-Means. K. Karteeka Pavan, Allam Appa Rao, A.V. Dattatreya Rao and G.R. Sridhar. Journal of Computer Science 6 (1): 60-66, 2010. pdf


Google for "supervised" k means clustering & k++ means.... also specify your performance needs ( whats your k? how many input points?)

In general, a few thousand points can easily be clustered w a naive k means algorithm implementation... So I would try that first.

Also, if your not sure what K should be, try MCL clustering first to get a good estimate.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜