开发者

Math question regarding Python's uuid4

I'm not great with statistical mathematics, etc. I've been wondering, if I use the following:

import uuid
unique_str = str(uuid.uuid4())
double_str = ''.join([str(uuid.uuid4()), str(uuid.uuid4())])

Is double_str string squared as unique as unique_str or just some amount more unique? Also, is there any negative implic开发者_开发知识库ation in doing something like this (like some birthday problem situation, etc)? This may sound ignorant, but I simply would not know as my math spans algebra 2 at best.


The uuid4 function returns a UUID created from 16 random bytes and it is extremely unlikely to produce a collision, to the point at which you probably shouldn't even worry about it.

If for some reason uuid4 does produce a duplicate it is far more likely to be a programming error such as a failure to correctly initialize the random number generator than genuine bad luck. In which case the approach you are using it will not make it any better - an incorrectly initialized random number generator can still produce duplicates even with your approach.

If you use the default implementation random.seed(None) you can see in the source that only 16 bytes of randomness are used to initialize the random number generator, so this is an a issue you would have to solve first. Also, if the OS doesn't provide a source of randomness the system time will be used which is not very random at all.

But ignoring these practical issues, you are basically along the right lines. To use a mathematical approach we first have to define what you mean by "uniqueness". I think a reasonable definition is the number of ids you need to generate before the probability of generating a duplicate exceeds some probability p. An approcimate formula for this is:

Math question regarding Python's uuid4

where d is 2**(16*8) for a single randomly generated uuid and 2**(16*2*8) with your suggested approach. The square root in the formula is indeed due to the Birthday Paradox. But if you work it out you can see that if you square the range of values d while keeping p constant then you also square n.


Since uuid4 is based off a pseudo-random number generator, calling it twice is not going to square the amount of "uniqueness" (and may not even add any uniqueness at all).

See also When should I use uuid.uuid1() vs. uuid.uuid4() in python?


It depends on the random number generator, but it's almost squared uniqueness.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜