generating poisson variables in c++
I implemented this function to generate a poisson random variable
typedef long unsigned int luint;
luint poisson(luint lambda) {
double L = exp(-double(lambda));
luint k = 0;
double p = 1;
do {
k++;
p *= mrand.rand();
} while( p > L);
return (k-1);
}
where mrand is the MersenneTwister random number generator. I find that, as I increase 开发者_如何学Golambda, the expected distribution is going to be wrong, with a mean that saturates at around 750. Is it due to numerical approximations or did I make any mistakes?
If you go the "existing library" route, your compiler may already support the C++11 std::random package. Here is how you use it:
#include <random>
#include <ctime>
#include <iostream>
std::mt19937 mrand(std::time(0)); // seed however you want
typedef long unsigned int luint;
luint poisson(luint lambda)
{
std::poisson_distribution<luint> d(lambda);
return d(mrand);
}
int main()
{
std::cout << poisson(750) << '\n';
std::poisson_distribution<luint> d(750);
std::cout << d(mrand) << '\n';
std::cout << d(mrand) << '\n';
}
I've used it two ways above:
I tried to imitate your existing interface.
If you create a std::poisson_distribution with a mean, it is more efficient to use that distribution over and over for the same mean (as done in main()).
Here is sample output for me:
751
730
779
exp(-750) is a very small number, very close to the smallest possible double, so your issue is numerical. In any case, your complexity will be linear in lambda, so the algorithm isn't very efficient for high lambda. Unless you have a great reason to code this yourself, using an existing library implementation probably makes sense, as these numerical algorithms tend to be touchy precisely for the precision issues you're encountering.
Since you only use L
in the expression (p>L)
, you're essentially testing for (log(p) > -lambda)
. That's not a very helpful transformation. Sure, you don't need exp(-750) anymore, but you'll just overflow p
instead.
Now, p
is just Π(mrand.rand()), and log(p) is log(Π(mrand.rand())) is Σ(log(mrand.rand()). That gives you the necessary transformation:
double logp = 0;
do {
k++;
logp += log(mrand.rand());
} while( logp > -lambda);
double
has only 11 bits of exponent, but a 52 bits mantissa. Therefore this is a massive increase in numerical stability. The price paid is that you need a log
on every iteration, instead of a single exp
up front.
From another question I asked earlier, it seems you could also approximate poisson(750)
as poisson(375) + poisson(375)
.
In situations like these, you don't need to invoke the random number generator more than once. All you need is a table of cumulative probabilities:
double c[k] = // the probability that X <= k (k = 0,...)
Then generate a random number 0 <= r < 1
, and take the first integer X
such that c[X] > r
. You can find this X
with a binary search.
To generate this table, we need the individual probabilities
p[k] = lambda^k / (k! e^lambda) // // the probability that X = k
If lambda
is large, this becomes wildly inaccurate, as you have found. But we can use a trick here: start at (or near) the largest value, with k = floor[lambda]
, and pretend for the moment that p[k]
is equal to 1
. Then calculate p[i]
for i > k
using the recurrence relation
p[i+1] = (p[i]*lambda) / (i+1)
and for i < k
using
p[i-1] = (p[i]*i)/lambda
This ensures that the largest probabilities have the greatest possible precision.
Now just calculate c[i]
using c[i+1] = c[i] + p[i+1]
, up to the point where c[i+1]
is the same as c[i]
. Then you can normalise the array by dividing by this limiting value c[i]
; or you can leave the array as it is, and use a random number 0 <= r < c[i]
.
See: http://en.wikipedia.org/wiki/Inverse_transform_sampling
精彩评论