Which of these algorithm is better in performance and order for generating N unique random number in 1..n range?

2023-01-27 13:57 问答作者：

1

Take an array of n elements: { 1, 2, 3, .... n }. Shuffle the array using any of the standard algorithms of rand开发者_StackOverflow中文版omly shuffling arrays. The first N elements of the modified array is what you are looking for.

2

Simply use Random.Next() in a loop and check if it already exists or not in an Dictionary, until we have N numbers.

Notice that N << n ( N is very smaller than n)

Neither of those is best. You need a Fisher-Yates shuffle. The problem with the random solution is that you do a lot of unnecessary work up front. The problem with the second solution is that the chances of duplicates becomes higher with time so that you're throwing away a lot of values.

For a very efficient solution that gives you a subset of your values with zero possibility of duplicates (and no unnecessary up-front sorting), Fisher-Yates is the way to go.

dim n[N]                  // gives n[0] through n[N-1]
for each i in 0..N-1:
    n[i] = i              // initialise them to their indexes
nsize = N                 // starting pool size
do N times:
    i = rnd(nsize)        // give a number between 0 and nsize-1
    print n[i]
    nsize = nsize - 1     // these two lines effectively remove the used number
    n[i] = n[nsize]

By simply selecting a random number from the pool, replacing it with the top number from that pool, then reducing the size of the pool, you get a shuffle without having to worry about a large number of swaps up front.

This is important if the number is high in that it doesn't introduce an unnecessary startup delay.

For example, examine the following bench-check, choosing 10-from-10:

<------ n[] ------>
0 1 2 3 4 5 6 7 8 9  nsize  rnd(nsize)  output
-------------------  -----  ----------  ------
0 1 2 3 4 5 6 7 8 9     10           4       4
0 1 2 3 9 5 6 7 8        9           7       7
0 1 2 3 9 5 6 8          8           2       2
0 1 8 3 9 5 6            7           6       6
0 1 8 3 9 5              6           0       0
5 1 8 3 9                5           2       8
5 1 9 3                  4           1       1
5 3 9                    3           0       5
9 3                      2           1       3
9                        1           0       9

You can see the pool reducing as you go and, because you're always replacing the used one with an unused one, you'll never have a repeat.

It entirely depends on the two values (n and N).

For a large n and small N (e.g. pick two distinct random values between zero and a million), option two will be better. The expected time here is probably tricky to calculate, and well beyond my Sunday-evening abilities... but basically you need to iterate N times in the outer loop; then you need to test whether you've already returned that value (which is probably O(m) once you've picked m values already); then you need to potentially try again if you've found that value - this will have no upper bound on time, but will have a probabilistic time which is complex to compute.

When n and N are close to each other (e.g. pick 99 random values between one and a hundred inclusive) then a tweak to option one (only actually picking as many values as you need, rather than shuffling the whole array) will be better. Using a Fisher-Yates shuffle you'll get O(n).

In practice:

If I knew that I'll have a large n and small N, I'd pick the second option
If I knew that I'll have a value of n which is "not hugely larger" than N, I'd probably pick the first option
If it was too close to call but I knew the values ahead of time, I'd run it several times each way and benchmark the two
If I don't know the values ahead of time at all, I'd run the two options several times on lots of different (n, N) pairs to try to get a better feel for how to work out the balance.

Partial Fisher-Yates, with some tweaks*:

Algorithm to generate 1000 distinct integers in the range [0,8000]?

Algorithm to select a single, random combination of values?

_{* The main takeaway here is that memory usage is reduced, so it is now proportional to the number of items selected at most, not the number of items selectable. This can provide major savings if N << n (as you have mentioned). (Space usage is capped at O(n/2), no matter how close N is to n.)}

_{Running time is O(N).}

_{Apart from this, it is a fairly usual partial Fisher-Yates, aka Knuth shuffle.}

At the extreme 2 could potentially never finish as each number it generates could already be in the list. However, in general you will be iterating over far more than N numbers.

1 is guaranteed to complete in finite time.

继续阅读：.net algorithm performance random

Which of these algorithm is better in performance and order for generating N unique random number in 1..n range?

1

2

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

1

2