开发者

generating poisson variables in c++

I implemented this function to generate a poisson random variable

typedef long unsigned int luint;
luint poisson(luint lambda) {
    double L = exp(-double(lambda));
    luint k = 0;
    double p = 1;
    do {
        k++;
        p *= mrand.rand();
    } while( p > L);
    return (k-1);
}

where mrand is the MersenneTwister random number generator. I find that, as I increase 开发者_如何学Golambda, the expected distribution is going to be wrong, with a mean that saturates at around 750. Is it due to numerical approximations or did I make any mistakes?


If you go the "existing library" route, your compiler may already support the C++11 std::random package. Here is how you use it:

#include <random>
#include <ctime>
#include <iostream>

std::mt19937 mrand(std::time(0));  // seed however you want

typedef long unsigned int luint;

luint poisson(luint lambda)
{
    std::poisson_distribution<luint> d(lambda);
    return d(mrand);
}

int main()
{
    std::cout << poisson(750) << '\n';
    std::poisson_distribution<luint> d(750);
    std::cout << d(mrand) << '\n';
    std::cout << d(mrand) << '\n';
}

I've used it two ways above:

  1. I tried to imitate your existing interface.

  2. If you create a std::poisson_distribution with a mean, it is more efficient to use that distribution over and over for the same mean (as done in main()).

Here is sample output for me:

751
730
779


exp(-750) is a very small number, very close to the smallest possible double, so your issue is numerical. In any case, your complexity will be linear in lambda, so the algorithm isn't very efficient for high lambda. Unless you have a great reason to code this yourself, using an existing library implementation probably makes sense, as these numerical algorithms tend to be touchy precisely for the precision issues you're encountering.


Since you only use L in the expression (p>L), you're essentially testing for (log(p) > -lambda). That's not a very helpful transformation. Sure, you don't need exp(-750) anymore, but you'll just overflow p instead.

Now, p is just Π(mrand.rand()), and log(p) is log(Π(mrand.rand())) is Σ(log(mrand.rand()). That gives you the necessary transformation:

double logp = 0;
do {
    k++;
    logp += log(mrand.rand());
} while( logp > -lambda);

double has only 11 bits of exponent, but a 52 bits mantissa. Therefore this is a massive increase in numerical stability. The price paid is that you need a log on every iteration, instead of a single exp up front.


From another question I asked earlier, it seems you could also approximate poisson(750) as poisson(375) + poisson(375).


In situations like these, you don't need to invoke the random number generator more than once. All you need is a table of cumulative probabilities:

double c[k] = // the probability that X <= k (k = 0,...)

Then generate a random number 0 <= r < 1, and take the first integer X such that c[X] > r. You can find this X with a binary search.

To generate this table, we need the individual probabilities

p[k] = lambda^k / (k! e^lambda) // // the probability that X = k

If lambda is large, this becomes wildly inaccurate, as you have found. But we can use a trick here: start at (or near) the largest value, with k = floor[lambda], and pretend for the moment that p[k] is equal to 1. Then calculate p[i] for i > k using the recurrence relation

p[i+1] = (p[i]*lambda) / (i+1)

and for i < k using

p[i-1] = (p[i]*i)/lambda

This ensures that the largest probabilities have the greatest possible precision.

Now just calculate c[i] using c[i+1] = c[i] + p[i+1], up to the point where c[i+1] is the same as c[i]. Then you can normalise the array by dividing by this limiting value c[i]; or you can leave the array as it is, and use a random number 0 <= r < c[i].

See: http://en.wikipedia.org/wiki/Inverse_transform_sampling

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜