3 hash functions to best hash sliding window strings for a bloom filter with minimum collisions
I need 3 hash functions to hash strings of a sliding window moving over a text, to be used later to search within a bloom vector. I'm using C# in my programming
I read something about rolling hash functions and cyclic polynomials, they are used for sliding window applications. But really, I did not find any codes, they are just descriptions
So please, if anyone have any idea about 3 best C# hash functions to use with sliding window strings of fixed size (5-char), that consume less time and have 开发者_Python百科minimum number of collisions, either they are rolling hash functions or others, please help me with some C# codes or links to hash functions names
The strings are all hex, I mean consist of (0-9) and (A-F) capital letters only, along with the dash character (-) ... for example
my string can be AB-2C-65-ED-65
Duaa
Alpha chars only? Upper and lower? If you have just 5 case insensitive [a-z] chars, you can pack them into 5*5 = 25 bits phenomenally quickly with bit shifting since 26 < 2^5
. So a 32-bit int could pack the entire string with zero hash collisions.
[EDIT]
OK, now that you've clarified the input.
[0-9], [A-Z] + ['-'] is 37 characters... just over the 32 character limit for 5 bits. So, you'll need 6 bits to store each char. Luckily, even with 5 chars, that's only 6*5 = 30
bits total, so you can completely pack the entire string into a 32 bit int, and won't even need to fiddle the sign bit, so you'll have a very fast, completely lossless (zero collision) hash code that uniquely identifies each string... that could even be reversed! All numbers will be positive, so you could assign some other meaning to the sign bit if desired.
Below is your new function to pack 5 characters in the range [0-9], [a-z] + ['-'] into an int. You will first need to take the Hex string and dehex it back to its normal state, and then run it through the bit twiddling below.
string c; //This is your text to hash.
public override int GetHashCode()
{
if (c.Length > 5) throw new InvalidOperationException("Sliding window string too long");
const char dash = (char)45;
//Pack all characters right-aligned within the int
return (c[0]-dash << 24) | (c[1]-dash << 18) | (c[2]-dash << 12) | (c[3]-dash << 6) | c[4]-dash;
}
Note: You can also safely use a few other characters between ASCII 45 and ASCII 109 like: "./:;<=>?[\]^_
" without modifying this code.
精彩评论