开发者

Will using a substring of an MD5 hash like this be unique enough?

What I am trying to do is create a 12 character id for articles on my website similar to how youtube handles their video id (http://www.youtube.com/watch?v=53iddd5IcSU). Right now I am generating an MD5 hash and then grabbing 12 characters of it like this:

$ArticleId = substr(MD5("Article".$currentID),10,12)

where $currentID is the numeric ID from the database (eg 144)

I am slightly paranoid that I will run into a duplicate $ArticleId, but realistically what are the chances that this will happen? And also, being that the column in my database is unique, how can I handle this rare scenario without having an ugly error thrown?

P.S. I made a small script to check for duplicates within the first 5000 $ArticleId's and there were none.

EDIT: I don't like the way the bas开发者_运维问答e64_encode hashes look so I did this:

function retryAID($currentID)
{
    $AID = substr(MD5("Article".$currentID*2),10,12);

    $setAID = "UPDATE `table` SET  `artID` =  '$AID' WHERE `id` = $currentID ";
    mysql_query($setLID) or retryAID($currentID);
}


$AID = substr(MD5("Article".$currentID),10,12);

$setAID = "UPDATE `table` SET  `artID` =  '$AID' WHERE `id` = $currentID ";
mysql_query($setAID) or retryAID($currentID);

Since the AID column is unique the mysql_query will throw an error and the retryAID function will find a unique id...


What's wrong with using a sequential id? The database will handle this for you.

That aside, 12 characters is still 96 bits. 296 = 79228162514264337593543950336 possible hashes. Even though MD5 is known to have collision vulnerabilities, there's a world of difference between the possibility of a collision and the probability of actually seeing one.

Update:

Based on the return value of the PHP md5 function you're using, my numbers above aren't quite right.

Returns the hash as a 32-character hexadecimal number.

Since you're taking 12 characters from a 32-character hexadecimal number (and not 12 bytes of the 128-bit hash), the actual number of possible hashes you could end up with is 1612 = 281474976710656. Still quite a few.


<?php
  function get_id()
  {
    $max = 1679615; // pow(36, 4) - 1;
    $id = '';

    for ($i = 0; $i < 3; ++$i)
    {
      $r = mt_rand(0, $max);
      $id .= str_pad(base_convert($r, 10, 36), 4, "0", STR_PAD_LEFT);
    }
    return $id;
  }
?>

Returns a 12 character number in base-36, which gives 4,738,381,338,321,616,896 possibilities. (The probability of collision depends on the distribution of the random number generator.)

To ensure no collisions, you'll need to loop:

<?php
do {
  $id = get_id();
} while ( !update_id($id) );
?>


No not very unique.

Why not base64 encode it if you need it shorter?


How about UUID ?

http://php.net/manual/en/function.uniqid.php

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜