How to count the number of bidirectionally connected neighbors of A or B?
Below is the query I have created to count the number of common strongly connected (connected in both directions) neighbors of two users:
DECLARE @monthly_connections_test TABLE (
calling_party VARCHAR(50)
, called_party VARCHAR(50))
INSERT INTO @monthly_connections_test
SELECT 'z1', 'z2'
UNION ALL SELECT 'z1', 'z3'
UNION ALL SELECT 'z1', 'z4'
UNION ALL SELECT 'z1', 'z5'
UNION ALL SELECT 'z1', 'z6'
UNION ALL SELECT 'z2', 'z1'
UNION ALL SELECT 'z2', 'z4'
UNION ALL SELECT 'z2', 'z5'
UNION ALL SELECT 'z2', 'z7'
UNION ALL SELECT 'z3', 'z1'
UNION ALL SELECT 'z4', 'z7'
UNION ALL SELECT 'z5', 'z1'
UNION ALL SELECT 'z5', 'z2'
UNION ALL SELECT 'z7', 'z4'
UNION ALL SELECT 'z7', 'z2'
SELECT t1.user1, t1.user2,
0 AS calling_calling, 0 AS calling_called,
0 AS called_calling, 0 AS called_called,
COUNT(*) AS both_directions
FROM (SELECT relevant_monthly_c开发者_如何学Connections.calling_party AS user1,
relevant_monthly_connections_1.calling_party AS user2,
relevant_monthly_connections.called_party AS calledUser
FROM @monthly_connections_test relevant_monthly_connections
INNER JOIN @monthly_connections_test AS relevant_monthly_connections_1
ON relevant_monthly_connections.called_party = relevant_monthly_connections_1.called_party
AND relevant_monthly_connections.calling_party < relevant_monthly_connections_1.calling_party
) t1
INNER JOIN @monthly_connections_test AS relevant_monthly_connections_2
ON relevant_monthly_connections_2.called_party = t1.user1
AND relevant_monthly_connections_2.calling_party = t1.calledUser
GROUP BY t1.user1, t1.user2
Now I would like to count of strongly connected neighbors of user1 OR user2. So for example for the pair (z1, z2) the number of strongly connected neighbors is 3 (z1 is strongly connected to z2, z3, z5 and z2 is ignored as it is one of the nodes from the pair and z2 is strongly connected to z1, z5 and z7. again, z1 is ignored and count((z3, z5) U (z5, z7)) is 3).
Does anyone know how to create the query to count the number of all nodes which are strongly connected with one of the node from the pair for each pair (the query has to automatically calculate the number of all neighbors for each record)?
Edit #1:
The following query returns the table of all bidirectional connections:
WITH bidirectionalConnections AS
(
SELECT calling_party AS user1, called_party AS user2 FROM @monthly_connections_test WHERE calling_party < called_party
INTERSECT
SELECT called_party AS user2, calling_party AS user2 FROM @monthly_connections_test
)
SELECT user1, user2 FROM bidirectionalConnections
Now for each pair of nodes it has to be checked in the table bidirectionalConnections how many nodes are strongly connected to first or second node from the pair.
The pairs and the number of their neighbors in the result have to be generated automatically.
Edit #2:
Here is the picture described by the @monthly_connections_test table:
So the neighbors strongly connected to z1 OR z2 are z3, z5, z7
z1, z3: z2, z5
z1, z4: z2, z3, z5, z7
...
z1, z7: z2, z3, z4, z5
...
The result table should be in the following format:
user1, user2, total_neighbors_count
z1, z2, 3
z1, z3, 2
z1, z4, 4
...
z1, z7, 4
...
Thank you!
P.S.
I have posted similar question How to use JOIN instead of UNION to count the neighbors of “A OR B”? but it is not the same so I hope this question not to be considered as a duplicate.
I believe the query exhibited below will generate the desired result. I have structured the query to make each stage of the pipeline explicit, which has the added side-effect of giving the query optimizer a strong hint as to how to minimize intermediate row set sizes. See the comments within the query itself for the purpose of each stage.
;WITH
-- identify the strongly connected parties
-- both directions are included here for later convenience
stronglyConnected AS (
SELECT DISTINCT
l.calling_party AS party1
, l.called_party AS party2
FROM @monthly_connections_test AS l
INNER JOIN @monthly_connections_test AS r
ON r.calling_party = l.called_party
AND r.called_party = l.calling_party
)
-- identify all of the parties that participated in a strong connection
, uniqueParties AS (
SELECT DISTINCT party1 AS party FROM stronglyConnected
)
-- make all unique pairs of such parties
, allPairs AS (
SELECT
u1.party AS party1
, u2.party AS party2
FROM uniqueParties AS u1
CROSS JOIN uniqueParties AS u2
WHERE u1.party < u2.party
)
-- find the neighbours of each pair
, pairNeighbors AS (
SELECT DISTINCT
p.party1
, p.party2
, sc.party2 AS neighbor
FROM allPairs AS p
INNER JOIN stronglyConnected AS sc
ON sc.party1 IN (p.party1, p.party2)
AND sc.party2 NOT IN (p.party1, p.party2)
)
-- count the neighbours of each pair
, neighbourCounts AS (
SELECT
party1 AS user1
, party2 AS user2
, COUNT(*) AS total_neighborCount
FROM pairNeighbors
GROUP BY
party1
, party2
)
-- show the final result
SELECT * FROM neighbourCounts ORDER BY 1, 2
-- handy for testing, debugging and answering other queries:
-- SELECT * FROM stronglyConnected ORDER BY 1, 2
-- SELECT * FROM uniqueParties ORDER BY 1
-- SELECT * FROM allPairs ORDER BY 1, 2
-- SELECT * FROM pairNeighbors ORDER BY 1, 2
I think the example query you provide in the question is faulty (based on the description) - it returns z5
- z7
as a strongly connected pair, when that combination does not exist at all in the sample data. I believe this is a correct implementation:
SELECT calling.*
FROM @monthly_connections_test AS calling
WHERE EXISTS ( SELECT 1
FROM @monthly_connections_test AS called
WHERE calling.calling_party = called.called_party
AND calling.called_party = called.calling_party
)
AND calling.calling_party < calling.called_party
I've extended this implementation to provide what you want. This is not a particularly pretty solution, and should be tested on a larger data set, since it may not scale brilliantly. I've used SQL 2008 variable notation since your other question cited SQL 2008.
DECLARE @user1 varchar(50) = 'z1'
DECLARE @user2 varchar(50) = 'z2'
;WITH strongCTE
AS
(
SELECT calling.calling_party AS c1,
calling.called_party AS c2
FROM @monthly_connections_test AS calling
WHERE EXISTS ( SELECT 1
FROM @monthly_connections_test AS called
WHERE calling.calling_party = called.called_party
AND calling.called_party = called.calling_party
)
AND calling.calling_party < calling.called_party
)
SELECT COUNT(1) AS ConnectedNeighboursToUser1orUser2
FROM
(
SELECT c2
FROM strongCTE
WHERE c1 = @user1
AND c2 NOT IN (@user1,@user2)
GROUP BY c1,c2
UNION
SELECT c2
FROM strongCTE
WHERE c1 = 'z2'
AND c2 NOT IN (@user1,@user2)
GROUP BY c1,c2
) AS x
Based on Edit2, the following query gives the results that have been listed:
declare @party1 varchar(50)
declare @party2 varchar(50)
--Since we're only interested in strong connections, we can treat both parties as calling_party in the following queries
select @party1 = 'z1', @party2 = 'z7'
select
distinct mt.called_party
from
@monthly_connections_test mt
inner join
@monthly_connections_test mt2
on
mt.called_party = mt2.calling_party and
mt.calling_party = mt2.called_party
where
mt.calling_party in (@party1,@party2) and
not mt.called_party in (@party1,@party2)
To just get the count, you'd adapt to use COUNT(distinct mt.called_party)
in the select clause
The following gives all group counts for every connected pair. I think it gets trickier if we need to avoid duplicates for strongly connected pairs:
select grp.called_party,grp.calling_party,COUNT(distinct mt.called_party )
from
(select
CASE WHEN calling_party < called_party THEN calling_party ELSE called_party END as calling_party,CASE WHEN calling_party < called_party THEN called_party ELSE calling_party END as called_party FROM @monthly_connections_test) grp,
@monthly_connections_test mt
inner join
@monthly_connections_test mt2
on
mt.called_party = mt2.calling_party and
mt.calling_party = mt2.called_party
where
mt.calling_party in (grp.called_party,grp.calling_party) and
not mt.called_party in (grp.called_party,grp.calling_party)
group by grp.called_party,grp.calling_party
精彩评论