开发者

In a graph, how to calculate sum of all nodes which a node can reach efficiently?

Given a directed graph, with weight assigned to every node.

Start from any node A, there 开发者_JAVA百科will be a set of nodes which could be reached from A.

Define SUM as total weight of this set.

Question: How to calculate SUM for every node in this graph efficiently?

The graph may contain millions of nodes.

Update 1:

The graph data structure is composed of the set of start nodes, and each node have a set of point-to nodes.

What I tried:

Calculate the descendants of every node recursively, and calculate the SUM according to descendant set. This recursion is very inefficient since I have to do set union, uteration operation for a lot of times. Further, I tried to cache the descendant set at each node. However, it easily runs out of memory.

So, is there any other solution?

Update 2:

Example graph, edges are all directed, upper nodes points to lower nodes

In a graph, how to calculate sum of all nodes which a node can reach efficiently?

Result should be

SUM(1)=55 SUM(2)=35 SUM(3)=41 SUM(4)=19 SUM(5)=22 SUM(6)=25 ...


Find the SCC's of the graph (Tarjan's algorithm, or a double DFS run).

For each SCC calculate the sum of it's node weights, denote this value by PARTIAL-SUM.

Iterate over the SCC's in reverse topological order; for every node in each SCC, its SUM will be the sum of all the SUM values of adjacent SCC's plus it's own PARTIAL-SUM value.

Linear running time O(E+V) since finding SCC's is linear, topological sort is linear, and the summation is linear since we visit each SCC at most once and each branch at most once.

EDIT

As was pointed out in the comments by tzkuzzy parallel paths pose a problem. That is easily solved by a simple DFS run on the SCC graph. On any cross-edge we simply take the already visited node up the DFS tree until we reach a not-fully-searched parent, this pair of nodes (the visited on the bottom, and the ancestor) has two distinct paths between them, we make a list for each node of such descendants, and on summation merely subtract their PARTIAL-SUM value.

So if:

 u
/ \
\ /
 w

Our DFS will pick up the cross edge connecting from a child node of u to w, and trace back to u (for those familiar with the typical DFS taught in schools the easiest explanation is that u is characterised as the first grey ancestor of w), so we add w to the list we maintain on u.

Then when we sum each SCC's adjacent SCC's as described, we add an extra step where we loop over the list mentioned and simply subtract PARTIAL-SUM values.

The DFS itself is still linear. Backtracking from a node to an ancestor can be linear if we cache the results (that way we don't traverse the same edge more than once). And the additional work in the summation is at most O(V), so we haven't changed the running time.

EDIT

Inclusion-exclusion makes this more difficult than I first thought. This solution isn't complete and doesn't work. A simple BFS for each node is more expensive but easier and will work.


My previous answer is a nice idea but inclusion-exclusion of intersecting subtrees is very difficult to solve (something I think your recursive approach will suffer from as well). Come to think of it, I'm beginning to doubt if there even exists a substructure of optimality in this problem that would lend to a dynamic programming solution...

I will propose a much simpler (albeit less efficient) solution:

For each node, run a BFS/DFS and sum the values of all nodes encountered. It'll run in O(V) space and O(V(E+V)) = O(EV) time, but is super simple to implement and doesn't need to be recursive.

You can optimise by finding the SCC graph and instead of performing the BFS/DFS on every node do so once for every SCC. This may increase running time by a huge factor if the graph is "cliquey".

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜