How can I efficiently model the sum of Bernoullli random variables?

2023-01-30 02:09 问答作者：

I am using Perl to model a random variable (Y) which is the sum of some ~15-40k independent Bernoulli rand开发者_如何学Goom variables (X_i), each with a different success probability (p_i). Formally, Y=Sum{X_i} where Pr(X_i=1)=p_i and Pr(X_i=0)=1-p_i.

I am interested in quickly answering queries such as Pr(Y<=k) (where k is given).

Currently, I use random simulations to answer such queries. I randomly draw each X_i according to its p_i, then sum all X_i values to get Y'. I repeat this process a few thousand times and return the fraction of times Pr(Y'<=k).

Obviously, this is not totally accurate, although accuracy greatly increases as the number of simulations I use increases.

Can you think of a reasonable way to get the exact probability?

First, I would avoid using the rand built-in for this purpose which is too dependent on the underlying C library implementation to be reliable (see, for example, my blog post pointing out that the range of rand on Windows has cardinality 32,768).

To use the Monte-Carlo approach, I would start with a known good random generator, such as Rand::MersenneTwister or just use one of Random.org's services and pre-compute a CDF for Y assuming Y is pretty stable. If each Y is only used once, pre-computing the CDF is obviously pointless.

To quote Wikipedia:

In probability theory and statistics, the Poisson binomial distribution is the discrete probability distribution of a sum of independent Bernoulli trials.

In other words, it is the probability distribution of the number of successes in a sequence of n independent yes/no experiments with success probabilities p1, …, pn. (emphasis mine)

Closed-Form Expression for the Poisson-Binomial Probability Density Function might be of interest. The article is behind a paywall:

and we discuss several of its advantages regarding computing speed and implementation and in simplifying analysis, with examples of the latter including the computation of moments and the development of new trigonometric identities for the binomial coefficient and the binomial cumulative distribution function (cdf).

As far as I recall, shouldn't this end up asymptotically as a normal distribution? See also this newsgroup thread: http://newsgroups.derkeiler.com/Archive/Sci/sci.stat.consult/2008-05/msg00146.html

If so, you can use Statistics::Distrib::Normal.

To obtain the exact solution you can exploit the fact that the probability distribution of the sum of two or more independent random variables is the convolution of their individual distributions. Convolution is a bit expensive but must be calculated only if the p_i change.

Once you have the probability distribution, you can easily obtain the CDF by calculating the cumulative sum of the probabilities.

继续阅读：montecarlo perl simulation statistics

How can I efficiently model the sum of Bernoullli random variables?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？