For every vertex in a graph, find all vertices within a distance d
In my particular case, the graph is represented as an adjacency list and is undirected and sparse, n can be in the millions, and d is 3. Calculating A^d (where A is the adjacency matrix) and picking out the non-zero entries works, but I'd like something that doesn't involve matrix multiplication. A b开发者_Python百科readth-first search on every vertex is also an option, but it is slow.
def find_d(graph, start, st, d=0):
if d == 0:
st.add(start)
else:
st.add(start)
for edge in graph[start]:
find_d(graph, edge, st, d-1)
return st
graph = { 1 : [2, 3],
2 : [1, 4, 5, 6],
3 : [1, 4],
4 : [2, 3, 5],
5 : [2, 4, 6],
6 : [2, 5]
}
print find_d(graph, 1, set(), 2)
Let's say that we have a function verticesWithin(d,x)
that finds all vertices within distance d
of vertex x
.
One good strategy for a problem such as this, to expose caching/memoisation opportunities, is to ask the question: How are the subproblems of this problem related to each other?
In this case, we can see that verticesWithin(d,x)
if d >= 1
is the union of vertices(d-1,y[i])
for all i
within range, where y=verticesWithin(1,x)
. If d == 0
then it's simply {x}
. (I'm assuming that a vertex is deemed to be of distance 0 from itself.)
In practice you'll want to look at the adjacency list for the case d == 1
, rather than using that relation, to avoid an infinite loop. You'll also want to avoid the redundancy of considering x
itself as a member of y
.
Also, if the return type of verticesWithin(d,x)
is changed from a simple list or set, to a list of d
sets representing increasing distance from x
, then
verticesWithin(d,x) = init(verticesWithin(d+1,x))
where init
is the function that yields all elements of a list except the last one. Obviously this would be a non-terminating recursive relation if transcribed literally into code, so you have to be a little bit clever about how you implement it.
Equipped with these relations between the subproblems, we can now cache the results of verticesWithin
, and use these cached results to avoid performing redundant traversals (albeit at the cost of performing some set operations - I'm not entirely sure that this is a win). I'll leave it as an exercise to fill in the implementation details.
You already mention the option of calculating A^d
, but this is much, much more than you need (as you already remark).
There is, however, a much cheaper way of using this idea. Suppose you have a (column) vector v
of zeros and ones, representing a set of vertices. The vector w := A v
now has a one at every node that can be reached from the starting node in exactly one step. Iterating, u := A w
has a one for every node you can reach from the starting node in exactly two steps, etc.
For d=3
, you could do the following (MATLAB pseudo-code):
v = j'th unit vector
w = v
for i = (1:d)
v = A*v
w = w + v
end
the vector w
now has a positive entry for each node that can be accessed from the j
th node in at most d
steps.
Breadth first search starting with the given vertex is an optimal solution in this case. You will find all the vertices that within the distance d, and you will never even visit any vertices with distance >= d + 2.
Here is recursive code, although recursion can be easily done away with if so desired by using a queue.
// Returns a Set
Set<Node> getNodesWithinDist(Node x, int d)
{
Set<Node> s = new HashSet<Node>(); // our return value
if (d == 0) {
s.add(x);
} else {
for (Node y: adjList(x)) {
s.addAll(getNodesWithinDist(y,d-1);
}
}
return s;
}
精彩评论