开发者

How to identify clusters of nodes in a network

I have a table describing several sets of connected nodes:

node
origin_node REFERENCES node
start_time
end_time

and I want to find out how many clusters the datas开发者_运维知识库et contains, e.g. if the records were:

A, B, 10:00, 11:00
B, C, 9:00, 9:15
D, E, 10:00, 10:15
B, A, 13:00, 13:30
E, B, 12:00, 13:20
F, G, 9:00, 9:15

...then I'd have 2 clusters {A,B,C,D,E} and {F,G}

(the times are pretty much irrelevant - it's just there to demonstrate that node+origin_node is not necessarily unique / ordered).

But I'm a bit stuck in working out an algorithm which identifies the clusters from a few thousand rows.

I'm working with MySQL 5.0.22 - so no 'CONNECT BY', and have access to PHP and awk - although it'd be easier for me to understand an algorithm rather than a coded solution. And as long as it takes less than a couple of hours to analyse the data, I'd lean to simplicity over order.

BTW: its a real-world problem - not homework (I stopped being a student a long time ago - perhaps too early ;)

TIA


it'd be easier for me to understand an algorithm rather than a coded solution

Tried these links?

http://en.wikipedia.org/wiki/Cluster_analysis

http://en.wikipedia.org/wiki/Category:Data_clustering_algorithms

Also, though not MySQL, there also is stuff on Microsoft's site:

http://msdn.microsoft.com/en-us/library/ms174879.aspx


Edit, per your comment:

In your particular case, something akin to creating a closure table might work.

Using a temporary table...

Start with an arbitrary node. Assign it to a new cluster.

Next node. Is there a link to a node from a currently identified cluster?

  • If no, assign it to a new cluster.

  • If yes, assign it to that cluster. Then, for each link, verify that already processed node is in the same cluster. If not, reassign them to that cluster.


Went with walking the network and flagging visited nodes (similar to garbage collection algorithms). Its reasonably efficient but needed quite a bit of code.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜