How can I fairly choose an item from a list?

2022-12-14 17:46 问答作者：

Let's say that I have a list of prizes:

PrizeA PrizeB PrizeC

And, for each of them, I want to draw a winner from a list of my attendees.

Give that my attendee list is as follows:

user1, user2, user3, user4, user5

What is an unbiased way to choose a user from that list?

Clearly, I will be using a cryptographically secure pseudo-random number generator, but how do I avoid a bias towards the front of the list? I assume I will not be using modulus?

EDIT

So, here is what I came up with:

class SecureRandom
{
    private RNGCryptoServicePr开发者_JAVA技巧ovider rng = new RNGCryptoServiceProvider();

    private ulong NextUlong()
    {
        byte[] data = new byte[8];
        rng.GetBytes(data);
        return BitConverter.ToUInt64(data, 0);
    }

    public int Next()
    {
        return (int)(NextUlong() % (ulong)int.MaxValue);
    }

    public int Next(int maxValue)
    {
        if (maxValue < 0)
        {
            throw new ArgumentOutOfRangeException("maxValue");
        }

        if (maxValue == 0)
        {
            return 0;
        }

        ulong chop = ulong.MaxValue - (ulong.MaxValue % (ulong)maxValue);

        ulong rand;

        do
        {
            rand = NextUlong();
        } while (rand >= chop);

        return (int)(rand % (ulong)maxValue);
    }
}

BEWARE:

Next() Returns an int in the range [0, int.MaxValue]

Next(int.MaxValue) Returns an int in the range [0, int.MaxValue)

Pseudocode for special random number generator:

rng is random number generator produces uniform integers from [0, max)
compute m = max modulo length of attendee list
do {
    draw a random number r from rng
} while(r >= max - m)
return r modulo length of attendee list

This eliminates the bias to the front part of the list. Then

put the attendees in some data structure indexable by integers
for every prize in the prize list
draw a random number r using above
compute index = r modulo length of attendee list
return the attendee at index

In C#:

public NextUnbiased(Random rg, int max) {
    do {
        int r = rg.Next();
    } while(r >= Int32.MaxValue - (Int32.MaxValue % max));
    return r % max;
}

public Attendee SelectWinner(IList<Attendee> attendees, Random rg) {    
    int winningAttendeeIndex = NextUnbiased(rg, attendees.Length)
    return attendees[winningAttendeeIndex];
}

Then:

// attendees is list of attendees
// rg is Random
foreach(Prize prize in prizes) {
    Attendee winner = SelectWinner(attendees, rg);
    Console.WriteLine("Prize {0} won by {1}", prize.ToString(), winner.ToString());
}

Assuming a fairly distributed random number generator...

do {
    i = rand();
} while (i >= RAND_MAX / 5 * 5);
i /= 5;

This gives each of 5 slots

[ 0 .. RAND_MAX / 5 )
[ RAND_MAX / 5 .. RAND_MAX / 5 * 2 )
[ RAND_MAX / 5 * 2 .. RAND_MAX / 5 * 3 )
[ RAND_MAX / 5 * 3 .. RAND_MAX / 5 * 4 )
[ RAND_MAX / 5 * 4 .. RAND_MAX / 5 * 5 )

and discards a roll which falls out of range.

You have already seem several perfectly good answers that depend on knowing the length of the list in advance.

To fairly select a single item from a list without needing to know the length of the list in the first place do this:

 if (list.empty()) error_out_somehow
 r=list.first()          // r is a reference or pointer
 s=list.first()          // so is s
 i = 2
 while (r.next() is not NULL)
    r=r.next()
    if (random(i)==0) s=r  // random() returns a uniformly
                           // drawn integer between 0 and i
    i++
 return s

(Useful if you list is stored as a linked list)

To distribute prizes in this scenario, just walk down the list of prizes selecting a random winner for each one. (If you want to prevent double winning you then remove the winner from the participant list.)

Why does it work?

You start with the first item at 1/1
On the next pass, you select the second item half the time (1/2), which means that the first item has probability 1 * (2-1)/2 = 1/2
on further iteration, you select the nth item with probability 1/n, and the chance for each previous item is reduced by a factor of (n-1)/n

which means that when you come to the end, the chance of having the mth item in the list (of n items) is

1/m * m/(m+1) * (m+1)/(m+2) * ... * (n-2)/(n-1) * (n-1)/n = 1/n

and is the same for every item.

If you are paying attention, you'll note that this means walking the whole list every time you want to select an item from the list, so this is not maximally efficient for (say) reordering the whole list (though it does that fairly).

I suppose one answer would be to assign each item a random value, and take the largest or smallest, drilling down as necessary.

I'm not sure if this is the most efficient, tho...

If you're using a good number generator, even with a modulus your bias will be miniscule. If, for instance, you're using a random number generator with 64 bits of entropy and five users, your bias toward the front of the array should be on the order of 3x10^-19 (my numbers may be off, by I don't think by much). That's an extra 3-in-10-quintillion likelihood of the first user winning compared to the later users. That should be good enough to be fair in anyone's book.

You can buy truly random bits from a provider, or use a mechanical device.

Here you will find Oleg Kiselyov's discussion of purely functional random shuffling.

A description of the linked content (quoted from the beginning of that article):

This article will give two pure functional programs that perfectly, randomly and uniformly shuffle a sequence of arbitrary elements. We prove that the algorithms are correct. The algorithms are implemented in Haskell and can trivially be re-written into other (functional) languages. We also discuss why a commonly used sort-based shuffle algorithm falls short of perfect shuffling.

You could use that to shuffle your list and then pick the first item of the shuffled result (or maybe you'd prefer not to give two prizes two the same person -- then use n initial positions of the result, for n = number of prizes); or you could simplify the algorithm to just produce the first item; or you could take a look around that site, because I could have sworn there's an article on picking one random element from an arbitrary tree-like structure with uniform distribution, in a purely functional way, proof of correctness provided, but my search-fu is failing me and I can't seem to find it.

Without truly random bits, you will always have some bias. The number of ways to assign prizes to guests is much larger than any common PRNG's period for even a fairly low number of guests and prizes. As suggested by lpthnc, buy some truly random bits, or buy some random-bit-generating hardware.

As for the algorithm, just do a random shuffle of the guest list. Be careful, as naive shuffling algorithms do have a bias: http://en.wikipedia.org/wiki/Shuffling#Shuffling_algorithms

You can 100% reliably pick a random item from any arbitrary list with a single pass and without knowing how many items are in the list ahead of time.

Psuedo Code:

count = 0.0;
item_selected = none;

foreach item in list
 count = count + 1.0;
 chance = 1.0 / count;
 if ( random( 1.0 ) <= chance ) then item_selected = item;

Test program comparing results of a single rand() % N vs iterating as above:

#include "stdafx.h"
#include <stdio.h>
#include <stdlib.h>
#include <memory.h>

static inline float frand01()
{
    return (float)rand() / (float)RAND_MAX;
}

int _tmain(int argc, _TCHAR* argv[])
{
    static const int NUM_ITEMS = 50;

    int resultRand[NUM_ITEMS];
    int resultIterate[NUM_ITEMS];

    memset( resultRand, 0, NUM_ITEMS * sizeof(int) );
    memset( resultIterate, 0, NUM_ITEMS * sizeof(int) );

    for ( int i = 0; i < 100000; i++ )
    {
        int choiceRand = rand() % NUM_ITEMS;

        int choiceIterate = 0;
        float count = 0.0;
        for ( int item = 0; item < NUM_ITEMS; item++ )
        {
            count = count + 1.0f;
            float chance = 1.0f / count;
            if ( frand01() <= chance )
            {
                choiceIterate = item;
            }
        }

        resultRand[choiceRand]++;
        resultIterate[choiceIterate]++;
    }

    printf("Results:\n");
    for ( int i = 0; i < NUM_ITEMS; i++ )
    {
        printf( "%02d - %5d %5d\n", i, resultRand[i], resultIterate[i] );
    }

    return 0;
}

Output:

Results:
00 -  2037  2050
01 -  2038  2009
02 -  2094  1986
03 -  2007  1953
04 -  1990  2142
05 -  1867  1962
06 -  1941  1997
07 -  2023  1967
08 -  1998  2070
09 -  1930  1953
10 -  1972  1900
11 -  2013  1985
12 -  1982  2001
13 -  1955  2063
14 -  1952  2022
15 -  1955  1976
16 -  2000  2044
17 -  1976  1997
18 -  2117  1887
19 -  1978  2020
20 -  1886  1934
21 -  1982  2065
22 -  1978  1948
23 -  2039  1894
24 -  1946  2010
25 -  1983  1927
26 -  1965  1927
27 -  2052  1964
28 -  2026  2021
29 -  2090  1993
30 -  2039  2016
31 -  2030  2009
32 -  1970  2094
33 -  2036  2048
34 -  2020  2046
35 -  2010  1998
36 -  2104  2041
37 -  2115  2019
38 -  1959  1986
39 -  1998  2031
40 -  2041  1977
41 -  1937  2060
42 -  1946  2048
43 -  2014  1986
44 -  1979  2072
45 -  2060  2002
46 -  2046  1913
47 -  1995  1970
48 -  1959  2020
49 -  1970  1997

继续阅读：algorithm language-agnostic random

How can I fairly choose an item from a list?

Psuedo Code:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Psuedo Code:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？