开发者

Fuzzy c- means categorical data

Can the fuzzy c-means applied on non numerical data sets ? i.e categorical or mixed numerical and categorical.. if yes (I hope 开发者_开发技巧so :( ):

  • how we calculate cluster centers ?

If NO , what is the alternative .. how to fuzzy clusters these data ?

I need the response please help

NOTE: I've used the Jacard's coefficient to calculate the distance between 2 points but still didn't get the way to calculate the cluster centers see the attachements

Fuzzy c- means categorical data

Fuzzy c- means categorical data


You'll have to transform your data into a numeric form. There are various ways of doing that, two of them being:

  • use vectors of feature counts (common in, e.g., text categorization)
  • use a one-hot representation, where a categorical feature that can take on n distinct values is represented as string of n bits, with only the i'th bit set if a feature has the i'th value in its allowed range.

Both are very common transformations that many machine learning programs do under the hood. Also, you might want to experiment with a different metric than the Euclidean one. Esp. with one-hot representation, but depending on the data, the L1 norm (Manhattan/city block distance) may be more appropriate.

Apart from that, just apply the given formulas to your transformed dataset.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜