开发者

Selecting all users within n-th degree of connection

The table consists of columns user1, user2, connectionStrength and the example records are the following:

A,B,0.2
A,C,0.5
A,G,0.1
B,C,0.8
W,Q,0.4
G,Q,0.5

I would like to retrieve all users within the defined degree of connection for the selected user and then draw a graph of connections. The question however is how to select all the records from the table that fulfill the condition. For example if user A is selected and the degree is set to 2 the following records from the example should be selected:

A,B,0.2
A,C,0.5
A,G,0.1
B,C,0.8
G,Q,0.5

The example above is hypothetical. In reality there are more than 200M connections in the database I work on and currently I use C# and Microsoft SQL Server 2008 for the analysis.

Does anyone know how to write a function (string GetQuery(string selectedUser, int degreeOfConnection)) to compose a query which returns all the开发者_开发技巧 records that fulfill the condition (degree of connection for the selected user)?

Edit #1

I tried to get connections by executing recursive query the following way:

WITH user_connections(user1, user2, link_strength, Level)
AS
(
SELECT user1, user2, link_strength, 0 AS Level FROM [dbo].[monthly_connections] AS mc WHERE user1 = '1ADF1126F26B4AD4441A3C552FCE04A4F7A79760'
UNION ALL
SELECT mc.user1, mc.user2, mc.link_strength, Level + 1 FROM [dbo].[monthly_connections] AS mc INNER JOIN user_connections AS uc ON uc.user2 = mc.user1
)
SELECT user1, user2, link_strength FROM user_connections OPTION(MAXRECURSION 1)

The query has been executing for more than 40 minutes so far so I would be very thankful if anyone could just check if the statement is composed correctly.

Thank you!


SQL 2005 and up allows you to create recursive queries using Common Table Expressions. The samples in the documentation should be enough to get you started.


It's really problematic to implement that search quickly since the number of contacts grows exponentially with the degree of connection. I'd try a meet in the middle algorithm. From both users find contacts of the n/2 degree and then check if these two sets have somebody in common.

If you need these queries often you might consider not running them in the database, but to load the connections into a LookUp and run the queries in C#. And ordering the value users in the Lookup by popularity might improve performance too. Since the connection is more likely to occur via a popular person.


maybe this helps: http://techportal.ibuildings.com/2009/09/07/graphs-in-the-database-sql-meets-social-networks/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜