开发者

How to generate a random integer in the range [0,n] from a stream of random bits without wasting bits?

I have a stream of (uniform) 开发者_Go百科random bits from which I'd like to generate random integers uniformly in the range [0,n] without wasting bits. (I'm considering bits wasted which are in excess of floor(log_2(n))+1, on the assumption that it's always possible to use no more than that.) E.g., if n = 5, then the algorithm I'm looking for should use no more than three bits. How can this be done?


Let me talk about random integer generating algorithms that are "optimal" in terms of the number of random bits it uses on average. In the rest of this post, we will assume we have a "true" random generator that can produce unbiased and independent random bits.

In 1976, D. E. Knuth and A. C. Yao showed that any algorithm that produces random integers with a given probability, using only random bits, can be represented as a binary tree, where random bits indicate which way to traverse the tree and each leaf (endpoint) corresponds to an outcome. (Knuth and Yao, "The complexity of nonuniform random number generation", in Algorithms and Complexity, 1976.) Knuth and Yao showed that any optimal binary tree algorithm for generating integers in [0, n) uniformly, will need at least log2(n) and at most log2(n) + 2 bits on average. (Thus, even an optimal algorithm has a chance of "wasting" bits.) See below for examples of optimal algorithms.

However, any optimal integer generator that is also unbiased will, in general, run forever in the worst case, as also shown by Knuth and Yao. Going back to the binary tree, each one of the n outcomes labels leaves in the binary tree so that each integer in [0, n) can occur with probability 1/n. But if 1/n has a non-terminating binary expansion (which will be the case if n is not a power of 2), this binary tree will necessarily either—

  • Have an "infinite" depth, or
  • include "rejection" leaves at the end of the tree,

And in either case, the algorithm will run forever in the worst case, even if it uses very few random bits on average. (On the other hand, when n is a power of 2, the optimal binary tree will have no rejection nodes and require exactly n bits before returning an outcome, so that no bits will be "wasted".) The Fast Dice Roller is an example of an algorithm that uses "rejection" events to ensure it's unbiased; see the comment in the code below.

Thus, in general, a random integer generator can be either unbiased or constant-time (or even neither), but not both. And the binary tree concept shows that there is no way in general to "fix" the worst case of an indefinite running time without introducing bias. For instance, modulo reductions (e.g., rand() % n) are equivalent to a binary tree in which rejection leaves are replaced with labeled outcomes — but since there are more possible outcomes than rejection leaves, only some of the outcomes can take the place of the rejection leaves, introducing bias. The same kind of binary tree — and the same kind of bias — results if you stop rejecting after a set number of iterations. (However, this bias may be negligible depending on the application. There are also security aspects to random integer generation, which are too complicated to discuss in this answer.)

Fast Dice Roller Implementation

There are many examples of optimal algorithms in the sense given earlier. One of them is the Fast Dice Roller by J. Lumbroso (2013) (implemented below), and perhaps other examples are the algorithm given in one of the other answers here and the algorithm given in the Math Forum in 2004. On the other hand, all the algorithms surveyed by M. O'Neill are not optimal, since they rely on generating blocks of random bits at a time. See also my note on integer generating algorithms.

The following is a JavaScript implementation of the Fast Dice Roller. Note that it uses rejection events and a loop to ensure it's unbiased. nextBit() is a method that produces an independent unbiased random bit (e.g., Math.random()<0.5 ? 1 : 0, which isn't necessarily efficient in terms of random bits ultimately relied on in JavaScript).

function randomInt(minInclusive, maxExclusive) {
 var maxInclusive = (maxExclusive - minInclusive) - 1
 var x = 1
 var y = 0
 while(true) {
    x = x * 2
    var randomBit = nextBit()
    y = y * 2 + randomBit
    if(x > maxInclusive) {
      if (y <= maxInclusive) { return y + minInclusive }
      // Rejection
      x = x - maxInclusive - 1
      y = y - maxInclusive - 1
    }
 }
}

The following version returns a BigInt, an arbitrary-precision integer supported in recent versions of JavaScript:

function randomInt(minInclusive, maxExclusive) {
 minInclusive=BigInt(minInclusive)
 maxExclusive=BigInt(maxExclusive)
 var maxInclusive = (maxExclusive - minInclusive) - BigInt(1)
 var x = BigInt(1)
 var y = BigInt(0)
 while(true) {
    x = x * BigInt(2)
    var randomBit = BigInt(Math.random()<0.5 ? 1 : 0)
    y = y * BigInt(2) + randomBit
    if(x > maxInclusive) {
      if (y <= maxInclusive) { return y + minInclusive }
      // Rejection
      x = x - maxInclusive - BigInt(1)
      y = y - maxInclusive - BigInt(1)
    }
 }
}

Reducing Bit Waste

Recall that "optimal" integer generators, such as the Fast Dice Roller above, use on average at least log2(n) bits (the lower bound), or come within 2 bits of this lower bound on average. There are various techniques that can be used to bring an algorithm (even a less than optimal one) closer to this theoretical lower bound, including batching and randomness extraction. These are discussed in:

  • The Fast Dice Roller paper itself, see section 3.1 (batching).
  • The paper "Random variate generation using only finitely many unbiased, independently and identically distributed random bits" by Devroye and Gravel, section 2.3 (randomness extraction).
  • The Math Forum page given above (recycling).


This is equivalent to find a two-way function between two set of different (finite) cardinality. It is impossible.


Although your question description specifies a fixed number of bits per random number generated your title does not. So I am going to add here that on average you can generate a random number with the number of bits you state plus half a bit. The algorithm below takes a variable number of bits for values of n not divisible by 2, but the average number bits it will consume is floor(log_2(n)) + 1.5.

Standard implementations of the function to generate an integer in a range use % (modulo) on a large random number. This wastes bits and will not produce a mathematically exact random distribution unless it is rerun for some values of the large random number. The following algorithm produces a true random distribution and will not waste bits. (Or rather I do not see an obvious way to reduce the number of bits it consumes. Maybe some entropy could be recovered from the 'number too large' occurences.)

# Generate a number from 0 to n inclusive without wasting bits.
function RandomInteger(n)
    if n <= 0
        error
    else
        i = Floor(Log2(n))
        x = i
        r = 0
        while x >= 0
            r = r + (2 ^ x) * NextRandomBit()
            if r > n 
                # Selected number too large so begin again.
                x = i 
                r = 0
            else
                # Still in range. Calculate the next bit.
                x = x - 1
        return r

The algorithm above is written for clarity not speed. It would be very fast if rewritten to process multiple bits at once.


It seems like you could just take x= ceil(log_2(n)) bits at a time, and just use these as your random numbers. The problem you'll encounter is that if the number you receive is greater than your limit (e.g. 5), then you'll want to perform some magic to get it less than 5, but uniformly. In this case, what seems logical is that you would just take another x bits, but since you've specified that we can't waste bits, then we're going to have to be more creative. I would recommend a right or left rotate, but this isn't always going to get you out of the situation. (Consider a string of 111 when you wanted n=5). We could do up to x rotates, to see if one of the rotates gets us into the correct range, or we could just flip all of the bits and add 1 (two's complement). I believe this will make it uniform.

So, for example, if you had the following string (rightmost bit is the first one you receive):

101001111010010101

And you're using n=5, then ceil(log2(n)) = 3, so you'll use three bits at a time, and the following will be your results (at each time step):

t=0 : 101 = 5
t=1: 010 = 2
t=2: 010 = 2
t=3: 111 = 7 -> too large, rotates won't work, so we use 2's complement: 001 = 1
t=4: 001 = 1
t=5: 101 = 5


First find out the number of possible values you want to generate. In case of integers in the range 0..5, that's 6 values. They can be represented in ceil( log(6)/log(2) ) bits.

// in C++
std::bitset< 3 > bits;
// fill the bitset

// interpret as a number
long value = bits.to_ulong();

Then find the transformation from n-bits to the final representation format: it needs to be scaled from the range [0..2N] to the range [from,to]:

double out_from=-1, out_to=5;
double in_from=0, in_to = std::bitset<3>().flip().to_ulong();

double factor   = (out_to-out_from)/(in_to-in_from)
double constant = out_from - in_from;

double rescaled = in_value * scale + constant;
long out = floor( rescaled );
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜