Any suggestion for optimizing the following query which counts common and all neighbors?
The table consists of columns calling_party and called_party and the record describes the connection between two users where one plays a role of calling party and the other is called party.
The same two users can have two connections - in this case the roles calling/called party are switched when the direction is changed.
In the original table (monthly_connections) I added additional columns common_neighbors and total_neighbors where the number of common and total neighbors are stored. To clarify the terms common and total_neighbors I added the following image:
In this case for the observed connection there are 2 common neighbors of calling and called party and 6 total neighbors.
In order to get these two values I wrote the following stored procedure:
CREATE PROCEDURE [dbo].[spCountNeighbors]
AS
Declare
@CallingParty varchar(50),
@CalledParty varchar(50),
@RecordsUpdated int
SET @CallingParty ='a'
SET @RecordsUpdated = 0
PRINT GETDATE()
WHILE @CallingParty IS NOT NULL BEGIN
SET @CallingParty = NULL
SELECT TOP 1 @CallingParty = calling_party, @CalledParty = called_party FROM monthly_connections WHERE common_neighbors IS NULL
--PRINT @CallingParty
IF @CallingParty IS NOT NULL BEGIN
WITH callingPartyNeighbors AS
(
SELECT called_party as neighbor FROM monthly_connections WHERE calling_party = @CallingParty
UNION
SELECT calling_party as neighbor FROM monthly_connections WHERE called_party = @CallingParty
),
calledPartyNeighbors AS
(
SELECT calling_party as neighbor FROM monthly_connections WHERE called_party = @CalledParty
UNION
SELECT called_party as neighbor FROM monthly_connections WHERE calling_party = @CalledParty
)
UPDATE mc SET common_neighbors = (SELECT COUNT (*) FROM
(
SELECT neighbor FROM callingPartyNeighbors
INTERSECT
SELECT neighbor FROM calledPartyNeighbors
)
t1
),
total_neighbors = (SELECT COUNT (*) FROM
(
SELECT neighbor FROM callingPartyNeighbors
UNION
SELECT neighbor FROM calledPartyNeighbors
)
t2
)
FROM monthly_connections mc WHERE (mc.calling_party = @CallingParty AND mc.called_party = @CalledParty) OR (mc.called_party = @CallingParty AND mc.calling_party = @CalledParty);
SET @RecordsUpdated = @RecordsUpdated + @@ROWCOUNT
PRINT @RecordsUpdated
END
END
PRINT @RecordsUpdated
The procedure above is supposed to go through the table of connections which contains 23M connections and update values common_neighbors and total_neighbors for each row. The problem however is that the procedure is too slow - it took 212 s to update 1000 records.
I would really appreciate if anyone of you suggested any fix for the procedure above in order to speed up the 开发者_JAVA技巧execution time.
Thank you!
Following script produces the same output for the common_neighbors
as your stored procedure does.
Somehow though, I have the feeling it isn't exactly (yet) what you need but you might pick it up for some new ideas.
DECLARE @monthly_connections TABLE (
calling_party VARCHAR(50)
, called_party VARCHAR(50)
, common_neighbors INTEGER
, total_neighbors INTEGER)
INSERT INTO @monthly_connections
SELECT '1', '3', NULL, NULL
UNION ALL SELECT '2', '4', NULL, NULL
UNION ALL SELECT '3', '2', NULL, NULL
UNION ALL SELECT '3', '4', NULL, NULL
UNION ALL SELECT '3', '6', NULL, NULL
UNION ALL SELECT '3', '7', NULL, NULL
UNION ALL SELECT '4', '5', NULL, NULL
UNION ALL SELECT '8', '4', NULL, NULL
;WITH q AS (
SELECT calling_party, called_party
FROM @monthly_connections mc1
UNION ALL
SELECT called_party, calling_party
FROM @monthly_connections mc1
)
UPDATE @monthly_connections
SET common_neighbors = common_neighbors.cnt
FROM @monthly_connections mc
INNER JOIN (
SELECT q1.calling_party, q1.called_party, cnt = COUNT(*)
FROM q q1
INNER JOIN q q2 ON q2.calling_party = q1.called_party
INNER JOIN q q3 ON q3.calling_party = q2.called_party
AND q3.called_party = q1.calling_party
GROUP BY
q1.calling_party, q1.called_party
) common_neighbors ON common_neighbors.calling_party = mc.calling_party
AND common_neighbors.called_party = mc.called_party
SELECT *
FROM @monthly_connections
In you procedure you are doing a lot of subquery, which i suppose is the primary source of your loss of performance. Can't you just replace the multiple query with a big join and then filter on it? Something like
SELECT T.calling_party, T.called_party, A.called_party, B.called_party
from table T
join table as A
on T.calling_party = A.calling_party
join table as B
on T.calling_party = B.calling_party
where A.called_party = B.called_party --to get the commong neighbour
You will probably need another join on the called_party to get the full list, however i think this could be faster than iterate through 23M records and calling multiple queries for all of them.
精彩评论