开发者

Identifying graphs in heap of connected nodes -- how is this called?

I have a SQL table with three columns X, Y, Z. I need to split it in groups in such a way that all records with same value of X or Y or Z are assigned to the same group. I need to make sure that the records with same value X or Y or Z are never split across multiple groups.

If you think of records as nodes and values of X, Y, Z as edges, this problem is the same as开发者_开发问答 finding all graphs where the nodes in each graph will be connected directly or indirectly via X, Y, or Z-edge, but each graph will have no edges in common with other graphs (otherwise it would be part of the same graph).

A few years ago I knew what this was called and even remembered the algorithm but now it escapes me. Please tell me how this problem is called so I can Google for solution. If you now a good algorithm -- please point me to it. If you have a SQL implementation -- I will marry you :)

Example:

    X                   Y               Z            BUCKET
---------     ----------------      ---------      -----------
   1                   34              56              1
   54                  43              45              2
   1                   12              22              1
   2                   34              11              1

The last row is in bucket 1 because of the value of Y=34 which is the same as of the first row, which is in bucket 1.


It looks not like a graph, more like a simplicial complex. But if we treat this complex as its skeletal graph (the numbers are treated as vertices and a row in a table means that all that three vertices are connected by an edge), then we may just use any algorithm to find connected components of this graph. I'm not sure whether there is a feasible way to do this in SQL though, perhaps it would be more prudent to use a graph database somehow.

However, for this specific problem there may be some easy solution attainable by means of SQL which I didn't look for.


to find how many nodes in each group x:

select x, count(x) 
from mytable
group by x

or to find the list of sets x:

select distinct x from mytable;


Why don't you initially GROUP BY one of the colums (say X), make buckets, then do so for Y and Z, each time merging all the buckets from the previous step if you find new groups.

Repeat the process for X, Y, and Z until the buckets stop changing.

Are you working for linked-in or facebook? :)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜