开发者

Convert MD5 to base62 for URL

I have a script to convert to base 62 (A-Za-z0-9) but how do I get a number out of MD5?

I开发者_如何学JAVA have read in many places that because the number from an MD5 is bigger than php can handle as an integer it will be inaccurate... As I want a short URL anyway and was not planning on using the whole hash, maybe just 8 characters of it....

So my question is how to get part of the number of an MD5 hash?

Also is it a bad idea to use only part of the MD5 hash?


I'm going to suggest a different thing here.. Since you are only interested in using a decimal chunk of the md5 hash why don't you use any other short numeric hash like CRC32 or Adler? Here is an example:

$hash = sprintf('%u', crc32('your string here'));

This will produce a 8 digit hash of your string.

EDIT: I think I misunderstood you, here are some functions that provide conversions to and from bases up to 62.

EDIT (Again): To work with arbitrary length numbers you must use either the bc_math or the GMP extension, here is a function that uses the bc_math extension and can also convert from base 2 up to base 62. You should use it like this:

echo bc_base_convert(md5('your url here'), 16, 62); // public base 62 hash

and the inverse:

echo bc_base_convert('base 62 encoded value here', 62, 16); // private md5 hash

Hope it helps. =)


If it's possible, I'd advise not using a hash for your URLs. Eventually you'll run into collisions... especially if you're truncating the hash. If you go ahead and implement an id-based system where each item has a unique ID, there will be far fewer headaches. The first item will be 1, the second'll be 2, etc---if you're using MySQL, just throw in an autoincrement column.

To make a short id:

//the basic example
$sid = base_convert($id, 10, 36);

//if you're going to be needing 64 bit numbers converted 
//on a 32 bit machine, use this instead
$sid = gmp_strval(gmp_init($id, 10), 36);

To make a short id back into the base-10 id:

//the basic example
$id = base_convert($id, 36, 10);

//if you're going to be needing 64 bit numbers
//on a 32 bit machine, use this instead
$id = gmp_strval(gmp_init($shortid, 36));

Hope this helps!

If you're truly wanting base 62 (which can't be done with gmp or base_convert), check this out: http://snipplr.com/view/22246/base62-encode--decode/


You can do this like this: (Not all steps are in php, it's been a long time that I've used it.)

  • Create a md5 hash of the script like this:

    $hash = md5(script, raw_output=true);

  • Convert that number to base 62.

    See the questions about base conversion of arbitrary sized numbers in PHP

  • Truncate the string to a length you like.

There's no risk in using only a few of the bits of a md5. All that changes is danger of collisions.


There actually is a Java implementation which you could probably extract. It's an open-source CMS solution called Pulse.

Look here for the code of toBase62() and fromBase62().

http://pulse.torweg.org/javadoc/src-html/org/torweg/pulse/util/StringUtils.java.html

The only dependency in StringUtils is the LifeCycle-class which provides a way to get a salted hash for a string which you might even omit all together or just copy the method over to your copy StringUtils. Voilá.


You can do something like this,

$hash = md5("The data to be hashed", true);
$ints = unpack("L*num", $hash);

$hash_str = base62($ints['num1']) . base62($ints['num2']) . base62($ints['num3']) . base62($ints['num4'])


As of PHP 5.3.2, GMP supports bases up to 62 (was previously only 36), so brianreavis's suggestion was very close. I think the simplest answer to your question is thus:

function base62hash($source, $chars = 22) {
  return substr(gmp_strval(gmp_init(md5($source), 16), 62), 0, $chars);
}

Converting from base-16 to base-62 obviously has space benefits. A normal 128-bit MD5 hash is 32 chars in hex, but in base-62 it's only 22. If you're storing the hashes in a database, you can convert them to raw binary and save even more space (16 bytes for an MD5).

Since the resulting hash is just a string representation, you can just use substr if you only want a bit of it (as the function does).


You may try base62x to get a safe and compatible encoded representation.

Here is for more information about base62x, or simply -base62x in -NatureDNS.

shell> ./base62x -n 16 -enc 16AF 
1Ql
shell> ./base62x -n 16 -dec 1Ql 
16AF

shell> ./base62x 
Usage: ./base62x [-v] [-n <2|8|10|16|32>] <-enc|dec> string 
Version: 0.60 


Here is an open-source Java library that converts MD5 strings to Base62 strings https://github.com/inder123/base62

Md5ToBase62.toBase62("9e107d9d372bb6826bd81d3542a419d6") ==> cbIKGiMVkLFTeenAa5kgO4

Md5ToBase62.fromBase62("4KfZYA1udiGCjCEFC0l") ==> 0000bdd3bb56865852a632deadbc62fc

The conversion is two-way, so you will get the original md5 back if you convert it back to md5:

Md5ToBase62.fromBase62(Md5ToBase62.toBase62("9e107d9d372bb6826bd81d3542a419d6")) ==> 9e107d9d372bb6826bd81d3542a419d6

Md5ToBase62.toBase62(Md5ToBase62.fromBase62("cbIKGiMVkLFTeenAa5kgO4")) . ==> cbIKGiMVkLFTeenAa5kgO4

```


You could use a slightly modified Base 64 with - and _ instead of + and /:

function base64_url_encode($str) {
    return strtr(base64_encode($str), array('+'=>'-', '/'=>'_'));
}
function base64_url_decode($str) {
    return base64_decode(strtr($str, array('-'=>'+', '_'=>'/')));
}

Additionally you could remove the trailing padding = characters.

And to get the raw MD5 value (binary string), set the second parameter (named $raw_output in the manual) to true:

$raw_md5 = md5($str, true);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜