开发者

Any suggestion for optimizing the following query which counts common and all neighbors?

The table consists of columns calling_party and called_party and the record describes the connection between two users where one plays a role of calling party and the other is called party.

The same two users can have two connections - in this case the roles calling/called party are switched when the direction is changed.

In the original table (monthly_connections) I added additional columns common_neighbors and total_neighbors where the number of common and total neighbors are stored. To clarify the terms common and total_neighbors I added the following image:

Any suggestion for optimizing the following query which counts common and all neighbors?

In this case for the observed connection there are 2 common neighbors of calling and called party and 6 total neighbors.

In order to get these two values I wrote the following stored procedure:

CREATE PROCEDURE [dbo].[spCountNeighbors]  

AS

Declare 
@CallingParty varchar(50),
@CalledParty varchar(50),
@RecordsUpdated int

SET @CallingParty ='a'
SET @RecordsUpdated = 0
PRINT GETDATE()
WHILE @CallingParty IS NOT NULL BEGIN
    SET @CallingParty = NULL
    SELECT TOP 1 @CallingParty = calling_party, @CalledParty = called_party FROM    monthly_connections WHERE common_neighbors IS NULL
    --PRINT @CallingParty
    IF @CallingParty IS NOT NULL BEGIN
    WITH callingPartyNeighbors AS
    (
        SELECT called_party as neighbor FROM monthly_connections WHERE calling_party = @CallingParty
        UNION
        SELECT calling_party as neighbor FROM monthly_connections WHERE called_party = @CallingParty
    ),
    calledPartyNeighbors AS
    (
        SELECT calling_party as neighbor FROM monthly_connections WHERE called_party = @CalledParty
        UNION
        SELECT called_party as neighbor FROM monthly_connections WHERE calling_party = @CalledParty
    )

        UPDATE mc SET common_neighbors = (SELECT COUNT (*) FROM
        (
        SELECT neighbor FROM callingPartyNeighbors
        INTERSECT
        SELECT neighbor FROM calledPartyNeighbors
        )
        t1
        ),
        total_neighbors = (SELECT COUNT (*) FROM
        (
        SELECT neighbor FROM callingPartyNeighbors
        UNION
        SELECT neighbor FROM calledPartyNeighbors
        )
        t2
        )
         FROM monthly_connections mc WHERE (mc.calling_party = @CallingParty AND mc.called_party = @CalledParty) OR (mc.called_party = @CallingParty AND mc.calling_party = @CalledParty);
        SET @RecordsUpdated = @RecordsUpdated + @@ROWCOUNT
        PRINT @RecordsUpdated
    END 
END
PRINT @RecordsUpdated

The procedure above is supposed to go through the table of connections which contains 23M connections and update values common_neighbors and total_neighbors for each row. The problem however is that the procedure is too slow - it took 212 s to update 1000 records.

I would really appreciate if anyone of you suggested any fix for the procedure above in order to speed up the 开发者_JAVA技巧execution time.

Thank you!


Following script produces the same output for the common_neighbors as your stored procedure does.

Somehow though, I have the feeling it isn't exactly (yet) what you need but you might pick it up for some new ideas.

DECLARE @monthly_connections TABLE (
  calling_party VARCHAR(50)
  , called_party VARCHAR(50)
  , common_neighbors INTEGER
  , total_neighbors INTEGER)

INSERT INTO @monthly_connections
          SELECT '1', '3', NULL, NULL
UNION ALL SELECT '2', '4', NULL, NULL
UNION ALL SELECT '3', '2', NULL, NULL
UNION ALL SELECT '3', '4', NULL, NULL
UNION ALL SELECT '3', '6', NULL, NULL
UNION ALL SELECT '3', '7', NULL, NULL
UNION ALL SELECT '4', '5', NULL, NULL
UNION ALL SELECT '8', '4', NULL, NULL

;WITH q AS (
  SELECT  calling_party, called_party
  FROM    @monthly_connections mc1
  UNION ALL
  SELECT  called_party, calling_party
  FROM    @monthly_connections mc1
)
UPDATE  @monthly_connections
SET     common_neighbors = common_neighbors.cnt
FROM    @monthly_connections mc
        INNER JOIN (
          SELECT  q1.calling_party, q1.called_party, cnt = COUNT(*) 
          FROM    q q1
                  INNER JOIN q q2 ON q2.calling_party = q1.called_party                          
                  INNER JOIN q q3 ON q3.calling_party = q2.called_party
                                     AND q3.called_party = q1.calling_party
          GROUP BY
                  q1.calling_party, q1.called_party
        ) common_neighbors ON common_neighbors.calling_party = mc.calling_party
                              AND common_neighbors.called_party = mc.called_party


SELECT *
FROM  @monthly_connections        


In you procedure you are doing a lot of subquery, which i suppose is the primary source of your loss of performance. Can't you just replace the multiple query with a big join and then filter on it? Something like

SELECT T.calling_party, T.called_party, A.called_party, B.called_party
from table T
join table as A
on T.calling_party = A.calling_party
join table as B
on T.calling_party = B.calling_party
where A.called_party = B.called_party --to get the commong neighbour 

You will probably need another join on the called_party to get the full list, however i think this could be faster than iterate through 23M records and calling multiple queries for all of them.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜