fast, collisionless hash algorithm for path caching?
I'm working on converting a website using php. Part of my process is verify that image paths don't point to non-existant images (i.e. there are no broken images). Since many pages share certain images, I set up a cache array to see if I've already checked for the existence of an image file for a given path.
Using raw path string as the array index didn't work, so I used md5()
, and that does the trick. However, the conversion script is taking a long time, and it seems clear that that's because of the md5 calculation ( I've been running the conversion frequently over the past few days, and I noticed right away that as soon as my caching started working, the script took much longer to run.)
So I'm wondering if there is a faster hash algorithm that I can use in my cache, and of course I need one that won't produce collisions. Since this is a one-off script, I don't need a super-secure unbreakable algorithm, just one that gets the job done a little faster.
This comment apparently is a list of all the hashing functio开发者_Python百科ns that php has available to it.
Edit I didn't draw a lot of attention to this in my comment, but when I use the plain string of the path as the index for the cache array, it didn't work. As soon as I changed it to md5 hash, it worked. If I had more time I would troubleshoot this, but this is a one-off project that I can't spend more time than I absolutely must one.
Post Edit Okay, apparently I'm doing something way wrong with my caching; I must have changed something when I changed the indexes to hashes that caused the cache to start working, irrespective of the hashing. People are saying my hash should be okay with file path strings, and that md5s don't take that long anyway. So, I don't know what I'm doing wrong and I don't have time to figure it out in this project. I would delete this question but it already has answers.
If these hashes are used only inside PHP and are built dynamically as this script works, why not simply use an array?
if (isset($path_cache['/some/weird/ugly/long/path'])) {
...
}
would work just as well without the MD5 calculation overhead.
I suggest you use plain paths for this - no need to hash it. However, crc32
seems to be a fast one. Keep in mind - you're sacrificing collision rate to speed.
foreach (hash_algos() as $value) {
print hash($value, 'some random') . ' - ' . $value . '<br />';
}
It will print the string hashed in all hashing algorithms that php supports.
The fastest hash function appears to be
hash('adler32', $string);
However just md5()
works nearly as fast as the function above.
There is the benchamrk of all hashes available in PHP.
http://l.garygolden.me/fastest-hash-php
精彩评论