php make unique hash of rss description
im using ph开发者_开发知识库p to create a sorta rss aggregator that stores data from multiple site rss feeds into a mysql database. since articles could be duplicated on many websites, i want to avoid this. ive been told you could use hashing to make unique hashes based on content of rss[description + title]. Now which hashing algorithm is fastest and produces less characters that i can use for comparison to avoid duplicates.
Thanx in advance
sprintf('%u',crc32()) produces 4,294,967,296 combinations, and it's shorter than md5 or sha1. it's only 32 bits wide.
To avoid false duplicates you should use a cryptographically secure hashing algorithm like SHA-1 or MD5.
MD5 is fastest and produces hash that is 32 characters long.
<?php
$hash = md5($description . $title);
?>
I used it in my RSS parser for exactly same purpose. And it works like a charm.
精彩评论