开发者

How many ids can I safely create with Guids?

I still can't get my head around the assertion that Guid is safe to use as a unique identifier. Wikipedia page says

The total number of keys ... is so large that the probability of the same number being generated randomly twice is negligible.

My question is, how many Ids can I safely generate until the probability becomes not negligible? I mean, there has to be a limit, right (at most, the pigeonhole constraint)?

开发者_如何转开发

If the implementation of Guid generation varies, let's assume .NET Guid.


I did a test run by myself and after one week and one terabyte of guids there was still no duplicate.

See here to have an idea of the probability.


Strictly from Wikipedia Random UUID probability of duplicates.

It describes the probability of collision for java.util.UUID which has 122 meaningful bits. System.Guid in .NET uses all 128 bits though but this article will give you some aproximations.

In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs.


There are several different ways to generate GUIDs. Some implementations are stronger than others. A good GUID implementation approximates a random 128 bit number, meaning there are 2 to the 128 different states possible in a GUID (which is about 3.4 times 10 to the 38). The algorithms are usually not completely random and may contain information about the time the GUID was generated and/or the machine it was generated on.

By comparison, there are an estimated 9 × 10 to the 21 stars in the observable universe.

If you have 3.4 times 10 to the 38 states and you want to examine a (potentially large) sample of states to see if any two states are the same, that's known as the Birthday Problem. If you walk through the math, you'll see that indeed you need a VERY large number of samples to have a meaningful probability of two GUIDs being the same (and if the GUID generation method includes information about the machine and/or time generated, that places further constraints on just HOW the GUIDs can be generated).

I recently did the math for hash collisions for a set of 1,000,000 data points and found that with 40 bits, the probability of hash collision is very, very low. For 128 bits the probably of hash collision (for the same 1,000,000 data points) is astronomically low.


to my knowledge Guids are generated differently in every machine, means that if you start creating them right now in your machine you will never risk to create the same one as onether machine could do.

If you really want to stress this you can start creating and storing them in a database with unique index on that column and see how many rows you will insert before the first conflict is detected, I suppose your application will run for several years at minimum.

Edit:

it has the same range of IPv6 addresses and this what I've found on IPv6:

128 bit address space. In other words theoretically there are 340,282,366,920,938,463,463,374,607,431,768,211,456 addresses available. This means there are approximately 6.67 * 10^27 IPv6 addresses per square meter on our planet.

do you think that 6.67 * 10^27 records in the db table are enough? then in every square meter of the planet we have a computer generating its own GUIDS, also over the oceans, the sahara and so on... I think we can consider this Unique enough.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜