开发者

Compute social distance between two users

How you would code an efficient algorithm which can return a social 'distance' between two users.

For example, when you visit a profile on LinkedIn you can see what is the distance between you 开发者_StackOverflow中文版and the user.

-> user A is friend with user B - and B is friend with C. when A will visit C (the distance will be 1)

The graph is huge and so I'm wondering how it can be perform so fast.

I know that this question is likely to be closed but I really think it is a programming/algorithm question - I would not specify any languages because I am interested about the concept.


assuming you don't have any heuristic function about the distance to the target, the best solution that is valid is bi-directional BFS:
Algorithm idea: do a BFS search simultaneously from the source and the target: [BFS until depth 1 in both, until depth 2 in both, ....].
The algorithm will end when you find a vertex v, which is in both BFS's front.

Algorithm behavior: The vertex v that terminates the algorithm's run will be exactly in the middle between the source and the target.
This algorithm will yield much better result in most cases then BFS from the source [explanation why it is better then BFS follows], and will surely provide an answer, if one exist.

why is it better then BFS from the source?
assume the distance between source to target is k, and the branch factor is B [every vertex has B edges].
BFS will open: 1 + B + B^2 + ... + B^k vertices.
bi-directional BFS will open: 2 + 2B + 2B^2 + 2B^3 + .. + 2B^(k/2) vertices.

for large B and k, the second is obviously much better than the first.

EDIT:
NOTE, that this solution does NOT require storing the whole graph in memory, it only requires implementing a function: successor(v) which returns all the successors of a vertex [all vertices you can get to, within 1 step from v]. With this, only the nodes you open [2 + 2B + ... + 2B^(k/2) as explained above] should be stored. To further save memory, you can use Iterative Deepening DFS from one direction, instead of BFS, but it will consume more time.


I would have assumed that this would be done by applying a shortest path algorithm, such as breadth first search to a graph database. But they appear to store their entire graph in memory, at least according to this.

I'm sure that the algorithm ultimately boils down to some form of shortest path one over a graph structure (nodes and edges).

Edit: Changed the algorithm as per comments.


First the graph needs to be populated. I cannot say about how you get the graph from linked in, probably making a BFS or DFS of the nodes, discover the graphs, and establish links. To find the distance between any two best is to make a BFS from the source node and stop when the destination is found. The links does not have weights, i think, if you do not imply something other.

In this case you need to apply a BFS each for finding distance between each pair, when the source node is different. Else, you can implement the Floyd Warshall algorithm to get all source all destination shortest paths, and because each link has same weight it will get what you want. In this case once structure is made, for any source and destination the shortest distance can be found. One problem is that the network is always changing therefore re-processing is needed. Therefore BFS i think will be good.

To make processing faster you can implement BFS to run in parallel. Have a look at Design and analysis of a nondeterministic parallel breadth-first search algorithm

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜