Finding similarity between users in telecomunication network
I've got an anonymous table in which the are two columns: UserId and PhoneNumber.
It was selected from开发者_如何学编程 Call Details record table. Now I would like to create a network based on similarity between users. There should be a connection between users if they called to at least 3 the same numbers.
There are more than 20 million rows. When I use a simple program written in C#, it would take more then 4 days to accomplish this task. I wonder, is it possible to write SQL query which would give me the same result and if there is a similarity simply insert a row into a new table with two columns, user1 and user2, or just give it to the output?
Maybe there is some other good solution to accomplish this task?
Assuming your table is called CallingList, then you should be able to use a query like this:
SELECT C1.UserID AS User1, C2.UserID AS User2
FROM CallingList AS C1
JOIN CallingList AS C2 ON C1.PhoneNumber = C2.PhoneNumber
WHERE C1.UserID < C2.UserID
GROUP BY C1.UserID, C2.UserID
HAVING COUNT(*) >= 3
Whether that will be faster than the C# remains to be seen.
Make sure you have an index on CallingList(PhoneNumber) unless your optimizer will create one automatically behind the scenes.
精彩评论