开发者

Why does java.util.Random use the mask?

Simplified (i.e., leaving concurrency out) Random.next(int bits) looks like

protected int next(int bits) {
    seed = (seed * multiplier + addend) & mask;
    return (int) (seed >>> (48 - bits));
}

where masking gets used to reduce the seed to 48 bits. Why is it better than just

protected int next(i开发者_开发知识库nt bits) {
    seed = seed * multiplier + addend;
    return (int) (seed >>> (64 - bits));
}

? I've read quite a lot about random numbers, but see no reason for this.


The reason is that the lower bits tend to have a lower period (at least with the algorithm Java uses)

From Wikipedia - Linear Congruential Generator:

As shown above, LCG's do not always use all of the bits in the values they produce. The Java implementation produces 48 bits with each iteration but only returns the 32 most significant bits from these values. This is because the higher-order bits have longer periods than the lower order bits (see below). LCG's that use this technique produce much better values than those that do not.

edit:

after further reading (conveniently, on Wikipedia), the values of a, c, and m must satisfy these conditions to have the full period:

  1. c and m must be relatively primes

  2. a-1 is divisible by all prime factors of m

  3. a-1 is a multiple of 4 if m is a multiple of 4

The only one that I can clearly tell is still satisfied is #3. #1 and #2 need to be checked, and I have a feeling that one (or both) of these fail.


From the docs at the top of java.util.Random:

  • The algorithm is described in The Art of Computer Programming,
  • Volume 2 by Donald Knuth in Section 3.2.1. It is a 48-bit seed,
  • linear congruential formula.

So the entire algorithm is designed to operate of 48-bit seeds, not 64 bit ones. I guess you can take it up with Mr. Knuth ;p


From wikipedia (the quote alluded to by the quote that @helloworld922 posted):

A further problem of LCGs is that the lower-order bits of the generated sequence have a far shorter period than the sequence as a whole if m is set to a power of 2. In general, the nth least significant digit in the base b representation of the output sequence, where bk = m for some integer k, repeats with at most period bn.

And furthermore, it continues (my italics):

The low-order bits of LCGs when m is a power of 2 should never be relied on for any degree of randomness whatsoever. Indeed, simply substituting 2n for the modulus term reveals that the low order bits go through very short cycles. In particular, any full-cycle LCG when m is a power of 2 will produce alternately odd and even results.

In the end, the reason is probably historical: the folks at Sun wanted something to work reliably, and the Knuth formula gave 32 significant bits. Note that the java.util.Random API says this (my italics):

If two instances of Random are created with the same seed, and the same sequence of method calls is made for each, they will generate and return identical sequences of numbers. In order to guarantee this property, particular algorithms are specified for the class Random. Java implementations must use all the algorithms shown here for the class Random, for the sake of absolute portability of Java code. However, subclasses of class Random are permitted to use other algorithms, so long as they adhere to the general contracts for all the methods.

So we're stuck with it as a reference implementation. However that doesn't mean you can't use another generator (and subclass Random or create a new class):

from the same Wikipedia page:

MMIX by Donald Knuth m=264 a=6364136223846793005 c=1442695040888963407

There's a 64-bit formula for you.

Random numbers are tricky (as Knuth notes) and depending on your needs, you might be fine with just calling java.util.Random twice and concatenating the bits if you need a 64-bit number. If you really care about the statistical properties, use something like Mersenne Twister, or if you care about information leakage / unpredictability use java.security.SecureRandom.


It doesn't look like there was a good reason for doing this. Applying the mask is an conservative approach using a proven design. Leaving it out most probably leads to a better generator, however, without knowing the math well, it's a risky step.

Another small advantage of masking is a speed gain on 8-bit architectures, since it uses 6 bytes instead of 8.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜