Fast algorithm for counting the number of acyclic paths on a directed graph

2023-02-22 18:55 问答作者：

In short, I need a fast algorithm to count how many acyclic paths are there in a simple directed graph.

By simple graph I mean one without self loops or multiple edges. 开发者_StackOverflowA path can start from any node and must end on a node that has no outgoing edges. A path is acyclic if no edge occurs twice in it.

My graphs (empirical datasets) have only between 20-160 nodes, however, some of them have many cycles in them, therefore there will be a very large number of paths, and my naive approach is simply not fast enough for some of the graph I have.

What I'm doing currently is "descending" along all possible edges using a recursive function, while keeping track of which nodes I have already visited (and avoiding them). The fastest solution I have so far was written in C++, and uses std::bitset argument in the recursive function to keep track of which nodes were already visited (visited nodes are marked by bit 1). This program runs on the sample dataset in 1-2 minutes (depending on computer speed). With other datasets it takes more than a day to run, or apparently much longer.

The sample dataset: http://pastie.org/1763781 (each line is an edge-pair)

Solution for the sample dataset (first number is the node I'm starting from, second number is the path-count starting from that node, last number is the total path count): http://pastie.org/1763790

Please let me know if you have ideas about algorithms with a better complexity. I'm also interested in approximate solutions (estimating the number of paths with some Monte Carlo approach). Eventually I'll also want to measure the average path length.

Edit: also posted on MathOverflow under same title, as it might be more relevant there. Hope this is not against the rules. Can't link as site won't allow more than 2 links ...

This is #P-complete, it seems. (ref http://www.maths.uq.edu.au/~kroese/ps/robkro_rev.pdf). The link has an approximation

If you can relax the simple path requirement, you can efficiently count the number of paths using a modified version of Floyd-Warshall or graph exponentiation as well. See All pairs all paths on a graph

As mentioned by spinning_plate, this problem is #P-complete so start looking for your aproximations :). I really like the #P-completeness proof for this problem, so I'd think it would be nice to share it:

Let N be the number of paths (starting at s) in the graph and p_k be the number of paths of length k. We have:

N = p_1 + p_2 + ... + p_n

Now build a second graph by changing every edge to a pair of paralel edges.For each path of length k there will now be k^2 paths so:

N_2 = p_1*2 + p_2*4 + ... + p_n*(2^n)

Repeating this process, but with i edges instead of 2, up n, would give us a linear system (with a Vandermonde matrix) allowing us to find p_1, ..., p_n.

N_i = p_1*i + p_2*(i^2) + ...

Therefore, finding the number of paths in the graph is just as hard as finding the number of paths of a certain length. In particular, p_n is the number of Hamiltonian Paths (starting at s), a bona-fide #P-complete problem.

I havent done the math I'd also guess that a similar process should be able to prove that just calculating average length is also hard.

Note: most times this problem is discussed the paths start from a single edge and stop wherever. This is the opposite from your problem, but you they should be equivalent by just reversing all the edges.

Importance of Problem Statement

It is unclear what is being counted.

Is the starting node set all nodes for which there is at least one outgoing edge, or is there a particular starting node criteria?
Is the the ending node set the set of all nodes for which there are zero outgoing edges, or can any node for which there is at least one incoming edge be a possible ending node?

Define your problem so that there are no ambiguities.

Estimation

Estimations can be off by orders of magnitude when designed for randomly constructed directed graphs and the graph is very statistically skewed or systematic in its construction. This is typical of all estimation processes, but particularly pronounced in graphs because of their exponential pattern complexity potential.

Two Optimizing Points

The std::bitset model will be slower than bool values for most processor architectures because of the instruction set mechanics of testing a bit at a particular bit offset. The bitset is more useful when memory footprint, not speed is the critical factor.

Eliminating cases or reducing via deductions is important. For instance, if there are nodes for which there is only one outgoing edge, one can calculate the number of paths without it and add to the number of paths in the sub-graph the number of paths from the node from which it points.

Resorting to Clusters

The problem can be executed on a cluster by distributing according to starting node. Some problems simply require super-computing. If you have 1,000,000 starting nodes and 10 processors, you can place 100,000 starting node cases on each processor. The above case eliminations and reductions should be done prior to distributing cases.

A Typical Depth First Recursion and How to Optimize It

Here is a small program that provides a basic depth first, acyclic traversal from any node to any node, which can be altered, placed in a loop, or distributed. The list can be placed into a static native array by using a template with a size as one parameter if the maximum data set size is known, which reduces iteration and indexing times.

#include <iostream>
#include <list>

class DirectedGraph {

    private:
        int miNodes;
        std::list<int> * mnpEdges;
        bool * mpVisitedFlags;

    private:
        void initAlreadyVisited() {
            for (int i = 0; i < miNodes; ++ i)
                mpVisitedFlags[i] = false;
        }

        void recurse(int iCurrent, int iDestination,
               int path[], int index,
               std::list<std::list<int> *> * pnai) {

            mpVisitedFlags[iCurrent] = true;
            path[index ++] = iCurrent;

            if (iCurrent == iDestination) {
                auto pni = new std::list<int>;
                for (int i = 0; i < index; ++ i)
                    pni->push_back(path[i]);
                pnai->push_back(pni);

            } else {
                auto it = mnpEdges[iCurrent].begin();
                auto itBeyond = mnpEdges[iCurrent].end();
                while (it != itBeyond) {
                    if (! mpVisitedFlags[* it])
                        recurse(* it, iDestination,
                                path, index, pnai);
                    ++ it;
                }
            }

            -- index;
            mpVisitedFlags[iCurrent] = false;
        } 

    public:
        DirectedGraph(int iNodes) {
            miNodes = iNodes;
            mnpEdges = new std::list<int>[iNodes];
            mpVisitedFlags = new bool[iNodes];
        }

        ~DirectedGraph() {
            delete mpVisitedFlags;
        }

        void addEdge(int u, int v) {
            mnpEdges[u].push_back(v);
        }

        std::list<std::list<int> *> * findPaths(int iStart,
                int iDestination) {
            initAlreadyVisited();
            auto path = new int[miNodes];
            auto pnpi = new std::list<std::list<int> *>();
            recurse(iStart, iDestination, path, 0, pnpi);
            delete path;
            return pnpi;
        }
};

int main() {

    DirectedGraph dg(5);

    dg.addEdge(0, 1);
    dg.addEdge(0, 2);
    dg.addEdge(0, 3);
    dg.addEdge(1, 3);
    dg.addEdge(1, 4);
    dg.addEdge(2, 0);
    dg.addEdge(2, 1);
    dg.addEdge(4, 1);
    dg.addEdge(4, 3);

    int startingNode = 0;
    int destinationNode = 1;

    auto pnai = dg.findPaths(startingNode, destinationNode);

    std::cout
            << "Unique paths from "
            << startingNode
            << " to "
            << destinationNode
            << std::endl
            << std::endl;

    bool bFirst;
    std::list<int> * pi;
    auto it = pnai->begin();
    auto itBeyond = pnai->end();
    std::list<int>::iterator itInner;
    std::list<int>::iterator itInnerBeyond;
    while (it != itBeyond) {
        bFirst = true;
        pi = * it ++;
        itInner = pi->begin();
        itInnerBeyond = pi->end();
        while (itInner != itInnerBeyond) {
            if (bFirst)
                bFirst = false;
            else
                std::cout << ' ';
            std::cout << (* itInner ++);
        }
        std::cout << std::endl;
        delete pi;
    }

    delete pnai;

    return 0;
}

继续阅读：algorithm complexity-theory directed-graph graph optimization

Fast algorithm for counting the number of acyclic paths on a directed graph

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？