Finding Shortest path in DAG (unweighed), between 2 vertices
Before the Floyd–Warshall/Dijkstra replies flood comes in please let me explain the situation as i'm sure either algorithm can be tuned for this case, and it has to be as this is not a toy example program (mind you, in java so have to keep it manageable memory-wise)
What i have is a web graph generated from node 0 to node n, node 3 cannot link to node 5, because node 5 didnt exist when node 3 was choosing it's out links. Every "node" is represented as in_neighbours[nodeID] and out_neighbours[nodeID] say nodeId=3, so we're talking about node 3. Note also that in_/out_ are both sorted, (in_ is naturally sorted as 5 will have chosen its out links all at once, only then 6 will choose out_links so 3's in_'s can never contain {6, 5, 7}) and ofc both can contain duplicates. (in/out are ArrayList arrays of size n, where out_ is always of size d or m, which along with n is specified at startup by the user)
No weights. What i must do is find the averageDistance()
public double getAvgDistance() {
int sum = 0;
for (int i=1; i<n; i++) {
for (int j=0; j < i; j++) {
sum += dist(i, j); // there are duplicates, make sure i skip
}
}
return (double)sum / (double)( ((n*(n-1)) / 2) );
}
What I have so far is the best case. Note i want only to find the distance between j & i,开发者_高级运维 not all distances at the same time (not enough memory, it will be tested at m=20 d=1 000 000)
private int dist(int i, int j) {
int dist = 0;
for (int link : in_neighbours[j]) {
System.out.print("\nIs "+j+" linked to by "+i);
if (out_neighbours[i].contains(link)) {
System.out.print(" - yes!");
dist = 1;
}
}
return dist;
}
So im asking if the "fresher" (ofc at this point the graph is completed) node i is linking to any of its older buddies directly if so, distance is 1 hop.
Is it just me or the 'shortest' path will always be the first found path if nodes are traversed backwards?
How do i check if its not 1, the "else" after the base case? My math is fairly weak please be gentle :) Any hints how to make use of the fact that the links are sorted?
It's not homework or something that im trying to cheat around from, it's not about the code itself, this has to be a useful tool, the "learning" comes by itself along the way.
here's how a graph looks nodeID, out links, in links for m=7 n=13, (note the 0 cycles is just how the graph is initialized):
0 | 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 1 1 1 1 1 1 1 2 2 2 2 2 3 4 5 6 6 7 8 9
1 | 0 0 0 0 0 0 0 | 2 2 3 4 5 5 8 12
2 | 0 0 0 0 0 1 1 | 3 3 3 3 3 4 4 4 6 7 8 10
3 | 0 1 2 2 2 2 2 | 4 4 5 5 6 6 7 11
4 | 0 1 2 2 2 3 3 | 5 5 6 8 9 10
5 | 0 1 1 3 3 4 4 | 6 7 8 9 9 11 12
6 | 0 0 2 3 3 4 5 | 7 7 7 8 9 9 12
7 | 0 2 3 5 6 6 6 | 8 9 10 11 11 12
8 | 0 1 2 4 5 6 7 | 10 10 10 11 12
9 | 0 4 5 5 6 6 7 | 10 11 11
10 | 2 4 7 8 8 8 9 | 12 12
11 | 3 5 7 7 8 9 9 |
12 | 1 5 6 7 8 10 10 |
Sorry for the agonising long read. EDIT: Wrong code in the methods, this is what i think is correct now.
Revision of dist nr2, just try and find if theres a path at all:
private int dist(int i, int j) {
int dist = 0, c = 0, count = 0;
boolean linkExists = false;
for (int link : in_neighbours[j]) {
//System.out.print("\nIs "+j+" linked to by "+i);
if (out_neighbours[i].contains(link)) {
//System.out.print(" - yes!");
dist = 1; // there is a direct link
} else {
while ( c < j ) {
// if there's a path from 0 up to j, check if 'i' links to a node which eventually links to 'j'
if (out_neighbours[i].contains(c) &&
(out_neighbours[c].contains(link) || in_neighbours[c].contains(link) )) {
count++; // yes. and this is one node we had to step through to get closer
linkExists = true;
} else {
linkExists = false; // unreachable, the path was interrupted somewhere on the way
break;
}
c++;
}
if (linkExists) {
dist = count-1; // as 2 nodes are linked with 1 edge
} else {
dist = 0; // no path was found
}
}
}
return dist;
}
Since all edges have the same weight in your model, you can use a BFS search to find the shortest path from S to T.
This is an iterative process, starting with set #0, containing only the source node ({S}). At each step i, you create set #i by finding all nodes achievable from set (i-1) in one step.
The iteration terminates in two cases:
1) When you detect that set #k contains T. In this case you return k-1.
2) When the set is empty, meaning that the two nodes are unreachable.
The memory consumption is about twice the number of nodes, since at each step i you are working with two sets (i-1 and i), bounded by the total number of nodes.
--EDIT--
Here is a possible implementation (I made some tests on it):
private Integer getDist(int i, int j) {
Set<Integer> currentSet = new HashSet<Integer>();
currentSet.add(i);
int dist = 0;
while (true) {
Set<Integer> nextSet = new HashSet<Integer>();
for (Integer currNode : currentSet)
nextSet.addAll(out[currNode]);
if (nextSet.isEmpty())
return null; //i.e. infinite
if (nextSet.contains(j))
return dist;
dist++;
currentSet = nextSet;
}
}
The implementation assumes that in and out are defined as List<Integer>[], and nodes are identified by numbers starting from 0. The minimal distance is counted as the number of intermediate nodes in the path, and not as the number of edges.
The duplicates you have in the lists do not break anything here, but they are irrelevant for the algorithm.
精彩评论