How can I fairly choose an item from a list?
Let's say that I have a list of prizes:
PrizeA PrizeB PrizeC
And, for each of them, I want to draw a winner from a list of my attendees.
Give that my attendee list is as follows:
user1, user2, user3, user4, user5
What is an unbiased way to choose a user from that list?
Clearly, I will be using a cryptographically secure pseudo-random number generator, but how do I avoid a bias towards the front of the list? I assume I will not be using modulus?
EDIT
So, here is what I came up with:class SecureRandom
{
private RNGCryptoServicePr开发者_JAVA技巧ovider rng = new RNGCryptoServiceProvider();
private ulong NextUlong()
{
byte[] data = new byte[8];
rng.GetBytes(data);
return BitConverter.ToUInt64(data, 0);
}
public int Next()
{
return (int)(NextUlong() % (ulong)int.MaxValue);
}
public int Next(int maxValue)
{
if (maxValue < 0)
{
throw new ArgumentOutOfRangeException("maxValue");
}
if (maxValue == 0)
{
return 0;
}
ulong chop = ulong.MaxValue - (ulong.MaxValue % (ulong)maxValue);
ulong rand;
do
{
rand = NextUlong();
} while (rand >= chop);
return (int)(rand % (ulong)maxValue);
}
}
BEWARE:
Next()
Returns an int in the range [0, int.MaxValue]
Next(int.MaxValue)
Returns an int in the range [0, int.MaxValue)Pseudocode for special random number generator:
rng is random number generator produces uniform integers from [0, max)
compute m = max modulo length of attendee list
do {
draw a random number r from rng
} while(r >= max - m)
return r modulo length of attendee list
This eliminates the bias to the front part of the list. Then
put the attendees in some data structure indexable by integers
for every prize in the prize list
draw a random number r using above
compute index = r modulo length of attendee list
return the attendee at index
In C#:
public NextUnbiased(Random rg, int max) {
do {
int r = rg.Next();
} while(r >= Int32.MaxValue - (Int32.MaxValue % max));
return r % max;
}
public Attendee SelectWinner(IList<Attendee> attendees, Random rg) {
int winningAttendeeIndex = NextUnbiased(rg, attendees.Length)
return attendees[winningAttendeeIndex];
}
Then:
// attendees is list of attendees
// rg is Random
foreach(Prize prize in prizes) {
Attendee winner = SelectWinner(attendees, rg);
Console.WriteLine("Prize {0} won by {1}", prize.ToString(), winner.ToString());
}
Assuming a fairly distributed random number generator...
do {
i = rand();
} while (i >= RAND_MAX / 5 * 5);
i /= 5;
This gives each of 5 slots
[ 0 .. RAND_MAX / 5 )
[ RAND_MAX / 5 .. RAND_MAX / 5 * 2 )
[ RAND_MAX / 5 * 2 .. RAND_MAX / 5 * 3 )
[ RAND_MAX / 5 * 3 .. RAND_MAX / 5 * 4 )
[ RAND_MAX / 5 * 4 .. RAND_MAX / 5 * 5 )
and discards a roll which falls out of range.
You have already seem several perfectly good answers that depend on knowing the length of the list in advance.
To fairly select a single item from a list without needing to know the length of the list in the first place do this:
if (list.empty()) error_out_somehow
r=list.first() // r is a reference or pointer
s=list.first() // so is s
i = 2
while (r.next() is not NULL)
r=r.next()
if (random(i)==0) s=r // random() returns a uniformly
// drawn integer between 0 and i
i++
return s
(Useful if you list is stored as a linked list)
To distribute prizes in this scenario, just walk down the list of prizes selecting a random winner for each one. (If you want to prevent double winning you then remove the winner from the participant list.)
Why does it work?
- You start with the first item at
1/1
- On the next pass, you select the second item half the time (
1/2
), which means that the first item has probability1 * (2-1)/2 = 1/2
- on further iteration, you select the nth item with probability
1/n
, and the chance for each previous item is reduced by a factor of(n-1)/n
which means that when you come to the end, the chance of having the m
th item in the list (of n
items) is
1/m * m/(m+1) * (m+1)/(m+2) * ... * (n-2)/(n-1) * (n-1)/n = 1/n
and is the same for every item.
If you are paying attention, you'll note that this means walking the whole list every time you want to select an item from the list, so this is not maximally efficient for (say) reordering the whole list (though it does that fairly).
I suppose one answer would be to assign each item a random value, and take the largest or smallest, drilling down as necessary.
I'm not sure if this is the most efficient, tho...
If you're using a good number generator, even with a modulus your bias will be miniscule. If, for instance, you're using a random number generator with 64 bits of entropy and five users, your bias toward the front of the array should be on the order of 3x10^-19 (my numbers may be off, by I don't think by much). That's an extra 3-in-10-quintillion likelihood of the first user winning compared to the later users. That should be good enough to be fair in anyone's book.
You can buy truly random bits from a provider, or use a mechanical device.
Here you will find Oleg Kiselyov's discussion of purely functional random shuffling.
A description of the linked content (quoted from the beginning of that article):
This article will give two pure functional programs that perfectly, randomly and uniformly shuffle a sequence of arbitrary elements. We prove that the algorithms are correct. The algorithms are implemented in Haskell and can trivially be re-written into other (functional) languages. We also discuss why a commonly used sort-based shuffle algorithm falls short of perfect shuffling.
You could use that to shuffle your list and then pick the first item of the shuffled result (or maybe you'd prefer not to give two prizes two the same person -- then use n initial positions of the result, for n = number of prizes); or you could simplify the algorithm to just produce the first item; or you could take a look around that site, because I could have sworn there's an article on picking one random element from an arbitrary tree-like structure with uniform distribution, in a purely functional way, proof of correctness provided, but my search-fu is failing me and I can't seem to find it.
Without truly random bits, you will always have some bias. The number of ways to assign prizes to guests is much larger than any common PRNG's period for even a fairly low number of guests and prizes. As suggested by lpthnc, buy some truly random bits, or buy some random-bit-generating hardware.
As for the algorithm, just do a random shuffle of the guest list. Be careful, as naive shuffling algorithms do have a bias: http://en.wikipedia.org/wiki/Shuffling#Shuffling_algorithms
You can 100% reliably pick a random item from any arbitrary list with a single pass and without knowing how many items are in the list ahead of time.
Psuedo Code:
count = 0.0;
item_selected = none;
foreach item in list
count = count + 1.0;
chance = 1.0 / count;
if ( random( 1.0 ) <= chance ) then item_selected = item;
Test program comparing results of a single rand() % N vs iterating as above:
#include "stdafx.h"
#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
static inline float frand01()
{
return (float)rand() / (float)RAND_MAX;
}
int _tmain(int argc, _TCHAR* argv[])
{
static const int NUM_ITEMS = 50;
int resultRand[NUM_ITEMS];
int resultIterate[NUM_ITEMS];
memset( resultRand, 0, NUM_ITEMS * sizeof(int) );
memset( resultIterate, 0, NUM_ITEMS * sizeof(int) );
for ( int i = 0; i < 100000; i++ )
{
int choiceRand = rand() % NUM_ITEMS;
int choiceIterate = 0;
float count = 0.0;
for ( int item = 0; item < NUM_ITEMS; item++ )
{
count = count + 1.0f;
float chance = 1.0f / count;
if ( frand01() <= chance )
{
choiceIterate = item;
}
}
resultRand[choiceRand]++;
resultIterate[choiceIterate]++;
}
printf("Results:\n");
for ( int i = 0; i < NUM_ITEMS; i++ )
{
printf( "%02d - %5d %5d\n", i, resultRand[i], resultIterate[i] );
}
return 0;
}
Output:
Results:
00 - 2037 2050
01 - 2038 2009
02 - 2094 1986
03 - 2007 1953
04 - 1990 2142
05 - 1867 1962
06 - 1941 1997
07 - 2023 1967
08 - 1998 2070
09 - 1930 1953
10 - 1972 1900
11 - 2013 1985
12 - 1982 2001
13 - 1955 2063
14 - 1952 2022
15 - 1955 1976
16 - 2000 2044
17 - 1976 1997
18 - 2117 1887
19 - 1978 2020
20 - 1886 1934
21 - 1982 2065
22 - 1978 1948
23 - 2039 1894
24 - 1946 2010
25 - 1983 1927
26 - 1965 1927
27 - 2052 1964
28 - 2026 2021
29 - 2090 1993
30 - 2039 2016
31 - 2030 2009
32 - 1970 2094
33 - 2036 2048
34 - 2020 2046
35 - 2010 1998
36 - 2104 2041
37 - 2115 2019
38 - 1959 1986
39 - 1998 2031
40 - 2041 1977
41 - 1937 2060
42 - 1946 2048
43 - 2014 1986
44 - 1979 2072
45 - 2060 2002
46 - 2046 1913
47 - 1995 1970
48 - 1959 2020
49 - 1970 1997
精彩评论