开发者

recursive query problem - retrieving cluster of directed connections

Trying not to select same connection multiple times I have tried to use charindex() = 0 condition the following way:

WITH Cluster(calling_party, called_party, link_strength, Path)
AS
(SELECT
    calling_party,
    called_party,
    link_strength,
    CONVERT(varchar(max), calling_party + '.' + called_party) AS Path
FROM
    monthly_connections_test
WHERE
    link_strength > 0.1 AND
    calling_party = 'b'
UNION ALL
SELECT
    mc.calling_party,
    mc.called_party,
    mc.link_strength,
    CONVERT(varchar(max), cl.Path + '.' + mc.calling_party + '.' + mc.called_party) AS Path
FROM
    monthly_connections_test mc
INNER JOIN Cluster cl ON
    (
   开发者_开发知识库     mc.called_party = cl.called_party OR
        mc.called_party = cl.calling_party
    ) AND
    (
        CHARINDEX(cl.called_party + '.' + mc.calling_party, Path) = 0 AND
        CHARINDEX(cl.called_party + '.' + mc.called_party, Path) = 0
    )
WHERE
    mc.link_strength > 0.1
)
SELECT
    calling_party,
    called_party,
    link_strength,
    Path
FROM
    Cluster OPTION (maxrecursion 30000)

The condition however does not fulfill its purpose as the same rows are selected multiple times.

The actual aim here is to retrieve the whole cluster of connections to which the selected user (in the example user b) belongs.

Edit1:

I tried to modify the query the following way:

With combined_users AS
 (SELECT calling_party CALLING, called_party CALLED, link_strength FROM dbo.monthly_connections_test WHERE link_strength > 0.1),
 related_users1 AS
 (
 SELECT c.CALLING, c.CALLED, c.link_strength, CONVERT(varchar(max), '.' + c.CALLING + '.' + c.CALLED + '.') path from combined_users c where CALLING = 'a1'
 UNION ALL
 SELECT c.CALLING, c.CALLED, c.link_strength,
    convert(varchar(max),r.path  + c.CALLED + '.') path 
        from combined_users c 
        join related_users1 r 
        ON (c.CALLING = r.CALLED) and CHARINDEX(c.CALLING + '.' + c.CALLED + '.', r.path)= 0 

        ),
related_users2 AS
(
SELECT c.CALLING, c.CALLED, c.link_strength, CONVERT(varchar(max), '.' + c.CALLING + '.' + c.CALLED + '.') path from combined_users c where CALLED = 'a1'
UNION ALL
 SELECT c.CALLING, c.CALLED, c.link_strength,
    convert(varchar(max),r.path  + c.CALLING + '.') path 
        from combined_users c 
        join related_users2 r 
        ON c.CALLED = r.CALLING and CHARINDEX('.' + c.CALLING + '.' + c.CALLED, r.path)= 0
)
        SELECT CALLING, CALLED, link_strength, path FROM
        (SELECT CALLING, CALLED, link_strength, path FROM related_users1 UNION SELECT CALLING, CALLED, link_strength, path FROM related_users2) r OPTION (MAXRECURSION 30000)

To test the query I created the cluster below:

recursive query problem - retrieving cluster of directed connections

The query above returned the following table:

a1  a2  1.0000000   .a1.a2.
a11 a13 1.0000000   .a12.a1.a13.a11.
a12 a1  1.0000000   .a12.a1.
a13 a12 1.0000000   .a12.a1.a13.
a14 a13 1.0000000   .a12.a1.a13.a14.
a15 a14 1.0000000   .a12.a1.a13.a14.a15.
a2  a10 1.0000000   .a1.a2.a10.
a2  a3  1.0000000   .a1.a2.a3.
a3  a4  1.0000000   .a1.a2.a3.a4.
a3  a6  1.0000000   .a1.a2.a3.a6.
a4  a8  1.0000000   .a1.a2.a3.a4.a8.
a4  a9  1.0000000   .a1.a2.a3.a4.a9.

The query obviously returns connections toward the selected node and the connections in the opposite direction. The problem is the change of direction: for example the connections a7, a4 and a11, a10 are not selected because the direction is changed (relative to starting node).

Does anyone know how to modify the query in order to include all connections?

Thank you


Okay, there are several things to discuss here.

Zerothly, i have PostgreSQL, so all this is done with that; i'm trying to only use standard SQL, so this should work okay with SQL Server too.

Firstly, if you're only interested in calls with link strength greater than 0.1, let's say:

-- like calls, but only strong enough to be interesting
create view strong_calls (calling_party, called_party, link_strength)
as (
  select calling_party, called_party, link_strength
  from monthly_connections_test
  where link_strength > 0.1
);

And from now on, we'll talk in terms of this table.

Secondly, you say:

The actual aim here is to retrieve the whole cluster of connections to which the selected user (in the example user b) belongs.

If that's true, why are you computing the path? If you just want to know the set of connections, you can do:

with recursive cluster (calling_party, called_party, link_strength)
as (
  (
    select calling_party, called_party, link_strength
    from strong_calls
    where calling_party = 'b'
  )
  union
  (
    select c.calling_party, c.called_party, c.link_strength
    from cluster this, strong_calls c
    where c.calling_party = this.called_party
    or c.called_party = this.calling_party
  )
)
select *
from cluster;

Thirdly, perhaps you don't really want to find what connections are in the cluster, you want to find which people are in the cluster, and what the shortest path to them from the target is. In that case, you can do:

with recursive cluster (party, path)
as (
  select cast('b' as character varying), cast('b' as character varying)
  union
  (
    select (case
      when this.party = c.calling_party then c.called_party
      when this.party = c.called_party then c.calling_party
    end), (this.path || '.' || (case
      when this.party = c.calling_party then c.called_party
      when this.party = c.called_party then c.calling_party
    end))
    from cluster this, strong_calls c
    where (this.party = c.calling_party and position(c.called_party in this.path) = 0)
    or (this.party = c.called_party and position(c.calling_party in this.path) = 0)
  )
)
select party, path
from cluster
where not exists (
  select *
  from cluster c2
  where cluster.party = c2.party
  and (
    char_length(cluster.path) > char_length(c2.path)
    or (char_length(cluster.path) = char_length(c2.path)) and (cluster.path > c2.path)
  )
)
order by party, path;

As you can see, you were very much on the right track.

If you actually do want a list of all the calls in the cluster, with paths, then, er, i'll get back to you!

EDIT: Bear in mind that the queries which don't construct a path will have very different performance characteristics to those which do. Roughly speaking, the non-path queries will do O(n) work (probably in about O(log n) iteration steps), because they visit each node in the cluster, but the path-constructing steps will do much more - O(n!) maybe? - because they have to visit every path through the graph. You'll be alright if clusters are as big as in your examples, but if they're much bigger, you may find the runtime is prohibitive.


charindex('b.d','b.c.d.b')=0 because there's a 'c.' in between

Easier to read:

WITH cluster(calling_party, called_party, link_strength, PATH) 
     AS (SELECT calling_party, 
                called_party, 
                link_strength, 
                CONVERT(VARCHAR(MAX), calling_party + '.' + called_party) AS 
                PATH 
         FROM   monthly_connections_test 
         WHERE  link_strength > 0.1 
                AND calling_party = 'b' 
         UNION ALL 
         SELECT mc.calling_party, 
                mc.called_party, 
                mc.link_strength, 
                CONVERT(VARCHAR(MAX), cl.PATH + '.' + mc.calling_party + '.' + 
                mc.called_party) 
                AS PATH 
         FROM   monthly_connections_test mc 
                INNER JOIN cluster cl 
                  ON ( mc.called_party = cl.called_party 
                        OR mc.called_party = cl.calling_party ) 
                     AND ( Charindex(cl.called_party + '.' + mc.calling_party, 
                           PATH) 
                           = 0 
                           AND Charindex(cl.called_party + '.' + 
                               mc.called_party, 
                               PATH) 
                               = 
                               0 ) 
         WHERE  mc.link_strength > 0.1) 
SELECT calling_party, 
       called_party, 
       link_strength, 
       PATH 
FROM   cluster 
OPTION (MAXRECURSION 30000) 


To address your edited question, if you want to ignore directionality of links, try:

create view symmetric_users (calling_party, called_party, link_strength)
as (
  select calling_party, called_party, link_strength from monthly_connections_test
  union
  select called_party , calling_party, link_strength from monthly_connections_test
)

Then point your query at that.

If you have users who call each other, you will have two rows for each ordered pair of users. You should be able to filter those out by choosing the stronger.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜