Shuffling a huge range of numbers using minimal storage

2023-03-08 20:56 问答作者：

I've got a very large range/set of numbers, (1..1236401668096), that I would basically like to 'shuffle', i.e. randomly traverse without revisiting the same number. I will be running a Web service, and each time a request comes in it will increment a counter and pull the next 'shuffled' number from t开发者_StackOverflowhe range. The algorithm will have to accommodate for the server going offline, being able to restart traversal using the persisted value of the counter (something like how you can seed a pseudo-random number generator, and get the same pseudo-random number given the seed and which iteration you are on).

I'm wondering if such an algorithm exists or is feasible. I've seen the Fisher-Yates Shuffle, but the 1st step is to "Write down the numbers from 1 to N", which would take terabytes of storage for my entire range. Generating a pseudo-random number for each request might work for awhile, but as the database/tree gets full, collisions will become more common and could degrade performance (already a 0.08% chance of collision after 1 billion hits according to my calculation). Is there a more ideal solution for my scenario, or is this just a pipe dream?

The reason for the shuffling is that being able to correctly guess the next number in the sequence could lead to a minor DOS vulnerability in my app, but also because the presentation layer will look much nicer with a wider number distribution (I'd rather not go into details about exactly what the app does). At this point I'm considering just using a PRNG and dealing with collisions or shuffling range slices (starting with (1..10000000).to_a.shuffle, then, (10000001, 20000000).to_a.shuffle, etc. as each range's numbers start to run out).

Any mathemagicians out there have any better ideas/suggestions?

Concatenate a PRNG or LFSR sequence with `/dev/random` bits

There are several algorithms that can generate pseudo-random numbers with arbitrarily large and known periods. The two obvious candidates are the LCPRNG (LCG) and the LFSR, but there are more algorithms such as the Mersenne Twister.

The period of these generators can be easily constructed to fit your requirements and then you simply won't have collisions.

You could deal with the predictable behavior of PRNG's and LFSR's by adding 10, 20, or 30 bits of cryptographically hashed entropy from an interface like /dev/random. Because the deterministic part of your number is known to be unique it makes no difference if you ever repeat the actually random part of it.

Divide and conquer? Break down into manageable chunks and shuffle them. You could divide the number range e.g. by their value modulo n. The list is constructive and quite small depending on n. Once a group is exhausted, you can use the next one.

For example if you choose an n of 1000, you create 1000 different groups. Pick a random number between 1 and 1000 (let's call this x) and shuffle the numbers whose value modulo 1000 equals x. Once you have exhausted that range, you can choose a new random number between 1 and 1000 (without x obviously) to get the next subset to shuffle. It shouldn't exactly be challenging to keep track of which numbers of the 1..1000 range have already been used, so you'd just need a repeatable shuffle algorithm for the numbers in the subset (e.g. Fisher-Yates on their "indices").

I guess the best option is to use a GUID/UUID. They are made for this type of thing, and it shouldn't be hard to find an existing implementation to suit your needs.

While collisions are theoretically possible, they are extremely unlikely. To quote Wikipedia:

The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs

继续阅读：random range ruby shuffle

Shuffling a huge range of numbers using minimal storage

Concatenate a PRNG or LFSR sequence with `/dev/random` bits

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Concatenate a PRNG or LFSR sequence with /dev/random bits

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Concatenate a PRNG or LFSR sequence with `/dev/random` bits

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？