Shortest-path algorithms which use a space-time tradeoff?
Problem: finding shortest paths in an unweighted, undirected graph.
Breadth-first search can find the shortest path between two nodes, but this can take up to O(|V| + |E|) time. A precomputed lookup table would allow requests to be answered in O(1) time, but at the cost of O(|V|^2) space.
What I'm wondering: Is there an algorithm which offers a space-time tradeoff that's more fine-grained? In other words, is there an algorithm which:
- Finds shortest paths in more time than O(1), but is faster than a bidirectional breadth-first search
- Uses precomputed data which takes up less space than O(|V|^2)?
On the practical side: The graph is 800,000 nodes and is believed to be a small-world network. The all-pairs shortest paths table would be on the order of gigabytes -- that's not outrageous these days, but it doesn't suit our requirements.开发者_运维技巧
However, I am asking my question out of curiosity. What's keeping me up at night is not "how can I reduce cache misses for an all-pairs lookup table?", but "Is there a completely different algorithm out there that I've never heard of?"
The answer may be no, and that's okay.
You should start by looking at Dijkstra's algorithm for finding the shortest path. The a* algorithm is a variant that uses a heuristic to reduce the time taken to calculate the optimal route between the start and goal node (such as the euclidean distance). You can modify this heuristic for performance or accuracy.
It seems as if your input set must be very large, if a lookup table will be too large to store on the disk. I assume that that the data will not fit in RAM then, which means that whatever algorithm you use should be tuned to minimize the amounts of reads and writes. Whenever disks are involved space == time, because writing to disk is so slow.
The exact algorithm you should use depends on what kind of graph you have. This research paper might be of interest to you. Full disclosure: I have not read it myself, but it seems like it might be what you are looking for.
Edit:
If the graph is (almost) connected, which a small-world network is, a lookup table can't be smaller than V^2. This means that all lookups will require disk access. If the edges fit in main memory, it might be faster to just compute the path every time. Otherwise, you might compute the path from a table containing the lengths of all shortests paths. You can reconstruct the path from that table.
The key is to make sure that the entries in the table which are close to each other in either direction are also close to each other on the disk. This storage pattern accomplishes that:
1 2 1 2 5 6
3 4 3 4 7 8
9 10 13 14
11 12 15 16
It will also work well with the cache hierarchy.
In order to compute the table you might use a modified Floyd-Warshall, where you process the data in blocks. This would let you perform the computation in a reasonable amount of time, especially if you parallelize it.
精彩评论