Bloom Filter: How to find k hash functions?
A Bloom Filter needs k hash functions, which return a value between 0 and m (m is the length of the bit array). I have to implement such a bloom filter and I already read some theoretical papers about t开发者_如何学运维hose filter (how they work, how many hash functions you need, how the error behaves etc.)
Now I have two questions about hash functions:
- How do I find k hash functions - which hash functions should I use?
- How do I find hashs functions which return a value between 0 and m? Alternatively, how can I map the output of a hash function to the range 0-m?
You can use this - http://en.wikipedia.org/wiki/Universal_hashing
You can use the Kirsch-Mitzenmacher optimization:
hash_i = hash1 + i * hash2
Where hash1 is 32 bit, hash2 is 32 bit. In a look you could use:
hash1 = upper 32 bit of a 64 bit hash
hash2 = lower 32 bit of a 64 bit hash
hash = hash1
for(int i=0; i<k; i++) {
hash += hash2
}
You can get a bunch of hash values at one time with a cryptographic hash function like MD5 or SHA. Divide the cryptographic hash value into N m-bit pieces, and treat them just like you would the output of N normal hash functions.
The characteristics of the k hash functions that should be used are
given on the wikipedia page: Bloom Filter and go in the
algorithm description where it says - The requirement of designing k different independent hash functions...For good tutorials on universal hashing : Nice MIT lecture
U could use any hash functions....there are many hash simple hash functions available in the net. Refer http://en.wikipedia.org/wiki/List_of_hash_functions
Use any hash function to get a hash value and in order to get it in the range of 0-m , use the modulus(%) operator. i.e bitlocation = hash % m;
it might not be efficient ,but it works...
精彩评论