开发者

how to perform word clustering using k-means algorithm in java

Please help me how to perform word clustering using k-means algorithm in java. From the set of documents, I get word and its frequency count. Then i dont know how to start for clustering.I already search google. But no idea. 开发者_开发百科Please tell me steps to perform word clustering. Very needful now. Thanks in advance.


"Programming Collective Intelligence" by Toby Segaran has a wonderful chapter on how to do this. The examples are in Python, but they should be easy to port to Java.


In clustering most important thing is to build a method, which check how to things (for example) are "close" together. E.g. is you are interested in string with same lang, this could be like:

int calculateDistance(String s1, String s2) {
     return Math.abs(s1.length() - s2.length());
}

Then I'm not so sure, but in can be like this: 1. choose (can be randomly) first k string, 2. iterate for all string, and relate them to their "nearest" string.

Then can be something, like choosing from every "cluster" middle of it, and start it again. I don't remember it for 100% but I thing it is good way to start.

And remember, that most important is the method calculateDistance()!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜