Will using a substring of an MD5 hash like this be unique enough?
What I am trying to do is create a 12 character id for articles on my website similar to how youtube handles their video id (http://www.youtube.com/watch?v=53iddd5IcSU). Right now I am generating an MD5 hash and then grabbing 12 characters of it like this:
$ArticleId = substr(MD5("Article".$currentID),10,12)
where $currentID is the numeric ID from the database (eg 144)
I am slightly paranoid that I will run into a duplicate $ArticleId, but realistically what are the chances that this will happen? And also, being that the column in my database is unique, how can I handle this rare scenario without having an ugly error thrown?
P.S. I made a small script to check for duplicates within the first 5000 $ArticleId's and there were none.
EDIT: I don't like the way the bas开发者_运维问答e64_encode hashes look so I did this:
function retryAID($currentID)
{
$AID = substr(MD5("Article".$currentID*2),10,12);
$setAID = "UPDATE `table` SET `artID` = '$AID' WHERE `id` = $currentID ";
mysql_query($setLID) or retryAID($currentID);
}
$AID = substr(MD5("Article".$currentID),10,12);
$setAID = "UPDATE `table` SET `artID` = '$AID' WHERE `id` = $currentID ";
mysql_query($setAID) or retryAID($currentID);
Since the AID column is unique the mysql_query will throw an error and the retryAID function will find a unique id...
What's wrong with using a sequential id? The database will handle this for you.
That aside, 12 characters is still 96 bits. 296 = 79228162514264337593543950336 possible hashes. Even though MD5 is known to have collision vulnerabilities, there's a world of difference between the possibility of a collision and the probability of actually seeing one.
Update:
Based on the return value of the PHP md5 function you're using, my numbers above aren't quite right.
Returns the hash as a 32-character hexadecimal number.
Since you're taking 12 characters from a 32-character hexadecimal number (and not 12 bytes of the 128-bit hash), the actual number of possible hashes you could end up with is 1612 = 281474976710656. Still quite a few.
<?php
function get_id()
{
$max = 1679615; // pow(36, 4) - 1;
$id = '';
for ($i = 0; $i < 3; ++$i)
{
$r = mt_rand(0, $max);
$id .= str_pad(base_convert($r, 10, 36), 4, "0", STR_PAD_LEFT);
}
return $id;
}
?>
Returns a 12 character number in base-36, which gives 4,738,381,338,321,616,896 possibilities. (The probability of collision depends on the distribution of the random number generator.)
To ensure no collisions, you'll need to loop:
<?php
do {
$id = get_id();
} while ( !update_id($id) );
?>
No not very unique.
Why not base64 encode it if you need it shorter?
How about UUID ?
http://php.net/manual/en/function.uniqid.php
精彩评论