开发者

Sha1 substring question

I am making a pastebin type site and am trying to make the id be a random string like paste.com/4RT65L

开发者_StackOverflow

I am getting the sha1 of the id before i add it to the database but i am getting the substring of the first 8 characters of the sha1. is their a possibility of their being a double copy of the same sha1? I dont want their to accidentaly be a second paste with an id that has already been used?


Well the odds of having a collision in the 8 characters is significantly higher than having a collision with two Sha1 keys, but that doesn't mean it is likely that it will happen.

I would recommend that you do some testing on it. Generate random input and see how long it takes before you have a collision. If you like the results, then go with it. Otherwise, you'll need a longer string.

EDIT: You can also calculate the odds of a collision by looking at the Birthday Paradox.

Basically, if you are taking the first 8 hex digits from the SHA-1, then you have 16**8 (4,294,967,296) different available combinations.

Using an online Birthay Paradox calculator, after about 9200 hashes, you will have a 1% chance of a collision. It will take about 30,000 hashes before you have a 10% chance, and 77,000 before you have a 50% chance.

Its important to point out that as long as your hash function does a decent job of being pseudo-random, it doesn't matter which one you use (whether it is SHA1, MD5, or any form of Checksum)--these numbers assume completely random inputs, and thus you can only approach these values by using increasingly better hash functions.

So in the end, it depends on how much traffic you are expecting. If this is a small site, you can probably get away with it. If it is a large amount of traffic, then your odds of a collision are very high.


Before assigning the id, you could always check that it isn't taken... or even better, put a unique id on the database field... problem solved. :)

Wait, you say SHA1 of the id. You don't mean the autoinc id do you? My first guesses would be:

356a192b
da4b9237
77de68de

If you are using a random id, why run sha1 on it?


I figured it out, my code is:

strtoupper(substr(sha1($token_start . $id . $token_end), 0, 8))

where $id is the id which is obtained be finding out what the total amount of id's are in the database + 1, being the next id since it is auto increment.

then when it inserts the entry it inserts the encrypted.

$token_start and $token_end are both random strings you can choose to make the new id unique.

I made a loop which inserted them 32 000 times into a database, just the id, autoincrement along with the new id and i did a search with distinct and didnt get any dublicates. thats more than enough for me. Any comments would be helpful. I dont know how long it would take untile it would give me a collision. if anybody knows when the first one would be that would be awesome.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜