how to perform word clustering using k-means algorithm in java
Please help me how to perform word clustering using k-means algorithm in java. From the set of documents, I get word and its frequency count. Then i dont know how to start for clustering.I already search google. But no idea. 开发者_开发百科Please tell me steps to perform word clustering. Very needful now. Thanks in advance.
"Programming Collective Intelligence" by Toby Segaran has a wonderful chapter on how to do this. The examples are in Python, but they should be easy to port to Java.
In clustering most important thing is to build a method, which check how to things (for example) are "close" together. E.g. is you are interested in string with same lang, this could be like:
int calculateDistance(String s1, String s2) {
return Math.abs(s1.length() - s2.length());
}
Then I'm not so sure, but in can be like this: 1. choose (can be randomly) first k string, 2. iterate for all string, and relate them to their "nearest" string.
Then can be something, like choosing from every "cluster" middle of it, and start it again. I don't remember it for 100% but I thing it is good way to start.
And remember, that most important is the method calculateDistance()!
精彩评论