How should I use random.jumpahead in Python

2022-12-25 16:05 问答作者：

I have a application that does a certain experiment 1000 times (multi-threaded, so that multiple experiments are done at the same time). Every experiment needs appr. 50.000 random.random() calls.

What is the best approach to get this really random. I could copy a random object to every experiment and do than a jumpahead of 50.000 * expid. The documentation suggests that jumpahead(1) already scrambles the state, but is that really true?

Or is there another way to do this in 'the best way'?

(No, the random numbers are not used for security, but for a metropolis hasting algorithm. The only requirement 开发者_StackOverflow社区is that the experiments are independent, not whether the random sequence is somehow predictable or so)

I could copy a random object to every experiment and do than a jumpahead of 50.000 * expid.

Approximately correct. Each thread gets their own Random instance.

Seed all of them to the same seed value. Use a constant to test, use /dev/random when you "run for the record".

Edit. Outside Python and in older implementations, use jumpahead( 50000 * expid ) to avoid the situation where two generators wind up with a parallel sequences of values. In any reasonably current (post 2.3) Python, jumpahead is no longer linear and using expid is sufficient to scramble the state.

You can't simply do jumpahead(1) in each thread, since that will assure they are synchronized. Use jumpahead( expid ) to assure each thread is distinctly scrambled.

The documentation suggests that jumpahead(1) already scrambles the state, but is that really true?

Yes, jumpahead does indeed "scramble" the state. Recall that for a given seed you get one -- long -- but fixed sequence of pseudo-random numbers. You're jumping ahead in this sequence. To pass randomness tests, you must get all your values from this one sequence.

Edit. Once upon a time, jumpahead(1) was limited. Now jumpahead(1) really does a larger scrambling. The scrambling, however, is deterministic. You can't simply do jumpahead(1) in each thread.

If you have multiple generators with different seeds, you violate the "one sequence from one seed" assumption and your numbers aren't going to be as random as if you get them from a single sequence.

If you only jumphead 1, you'll may be getting parallel sequences which will might be similar. [This similarity might not be detectable; theoretically, there's a similarity.]

When you jumpahead 50,000, you assure that you follow the 1-sequence-1-seed premise. You also assure that you won't have adjacent sequences of numbers in two experiments.

Finally, you also have repeatability. For a given seed, you get consistent results.

Same jumpahead: Not Good.

>>> y=random.Random( 1 )
>>> z=random.Random( 1 )
>>> y.jumpahead(1)
>>> z.jumpahead(1)
>>> [ y.random() for i in range(5) ]
[0.99510321786951772, 0.92436920169905545, 0.21932404923057958, 0.20867489035315723, 0.91525579001682567]
>>> [ z.random() for i in range(5) ]
[0.99510321786951772, 0.92436920169905545, 0.21932404923057958, 0.20867489035315723, 0.91525579001682567]

You shouldn't use that function. There is no proof it can work on Mersenne Twister generator. Indeed, it was removed from Python 3 for that reason.

For more information about generation of pseudo-random numbers on parallel environments, see this article from David Hill.

jumpahead(1) is indeed sufficient (and identical to jumpahead(50000) or any other such call, in the current implementation of random -- I believe that came in at the same time as the Mersenne Twister based implementation). So use whatever argument fits in well with your programs' logic. (Do use a separate random.Random instance per thread for thread-safety purposes of course, as your question already hints).

(random module generated numbers are not meant to be cryptographically strong, so it's a good thing that you're not using for security purposes;-).

Per the random module docs at python.org:

"You can instantiate your own instances of Random to get generators that don’t share state."

And there's also a relevant-looking note on jumpahead, as you mention. But the guarantees there are kind of vague. If the calls to OS-provided randomness aren't so expensive as to dominate your running time, I'd skip all the subtlety and do something like:

randoms = [random.Random(os.urandom(4)) for _ in range(num_expts)]

If num_expts is ~1000, then you're unlikely to have any collisions in your seed (birthday paradox says you need about 65000 experiments before there's a >50% probability that you have a collision). If this isn't good enough for you or if the number of experiments is more like 100k instead of 1k, then I think it's reasonable to follow this up with

for idx, r in enumerate(randoms):
  r.jumpahead(idx)

Note that I don't think it will work to just make your seed longer (os.urandom(8), for example), since the random docs state that the seed must be hashable, and so on a 32-bit platform you're only going to get at most 32 bits (4 bytes) of useful entropy in your seed.

This question piqued my curiosity, so I went and looked at the code implementing the random module. I am definitely not a PRNG expert, but it does seem like slightly differing values of n in jumpahead(n) will lead to markedly different Random instance states. (Always scary to contradict Alex Martelli, but the code does use the value of n when shuffling the random state).

继续阅读：python random

How should I use random.jumpahead in Python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？