Designing a Good Hash Function for Set Length URL Shortening in PHP
I'm working on a URL shortener. The input is a URL and the output needs to be a 4 character string (alphanumeric, case-sensitive).
I calculated it out that if I use 4 characters with a case-sensitive alphanumeric key space I should potentially be able to store 64^4 (16777216) URLs until I run out of space.
I also don't want my URL shortener to generate any short URLs tha开发者_Python百科t are offensive four letter words. It would be unfortunate if someone made a short URL that was domain.com/f**k. You get the picture...
Any ideas on the best way to go about this? I feel like I'll be using base64_encode somewhere in the process.
If I were you, I would make a case sensitive alphanumeric increment-er. Just increment, and assign the number to a database row. To check for bad words, just check against a black list. If it passes, great. If not, just increment again.
This way, instead of a hash algorithm, they're just in order. The first few would look like this:
id | url
-------------------------
0000 | http://google.com
0001 | http://yahoo.com
0002 | http://example.com
...
000a | http://mail.google.com
000b | http://adobe.com
...
000A | http://microsof.com
...
0010 | http://w3.org
...
00a0 | http://youtube.com
...
00A0 | http://stackoverflow.com
And so on.
Here's a hint on how the function will work: http://us3.php.net/manual/en/function.ord.php
BTW, my math might be wrong, but I think it's (10 + 26 + 26) ^ 4 = 14776336
Edit: Just for the fun and the challenge, I wrote an incrementer function. When the max is reached, it returns false, so just compare it to false (with ===) when using it.
http://pastebin.com/957KPn4p
It vaguely reminded me of thisHow do I create unique IDs, like YouTube?. You just have to be sure to check (in a more limited space) the possibility of a collision.
精彩评论