开发者

questions on clustering methods

recently I came to study clustering in data-mining an开发者_运维技巧d I've studied sequential clustering and hierarchical clustering and k-means.

I also read about a statement that distinguishes k-means from the other two clustering technique,saying k-means is not very good at dealing with nominal attributes,but the text didn't explain this point.So far,the only difference that I can see is that for K-means,we will know in advance we will need exactly K clusters while we don't know how many clusters we need for other two clustering methods.

So could anybody give me some idea here on why such statement exists,i.e.,k-means has this problem when dealing with examples of nominal attributes and is there a way to overcome this?

Thanks in advance.


The k-means algorithm calculates cluster centroids by taking the mean values of all the points in the cluster. If a parameter is nominal then you can't take an mean value.

Sometimes nominal values can be put into a kind of order and then mapped to real values. For example, days of the week could be mapped onto the range [1.0 - 7.0], but then again sometimes that isn't possible, for example an attribute with values [Windows, Linux, OSX].

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜