开发者

3 hash functions to best hash sliding window strings for a bloom filter with minimum collisions

I need 3 hash functions to hash strings of a sliding window moving over a text, to be used later to search within a bloom vector. I'm using C# in my programming

I read something about rolling hash functions and cyclic polynomials, they are used for sliding window applications. But really, I did not find any codes, they are just descriptions

So please, if anyone have any idea about 3 best C# hash functions to use with sliding window strings of fixed size (5-char), that consume less time and have 开发者_Python百科minimum number of collisions, either they are rolling hash functions or others, please help me with some C# codes or links to hash functions names

The strings are all hex, I mean consist of (0-9) and (A-F) capital letters only, along with the dash character (-) ... for example

my string can be AB-2C-65-ED-65

Duaa


Alpha chars only? Upper and lower? If you have just 5 case insensitive [a-z] chars, you can pack them into 5*5 = 25 bits phenomenally quickly with bit shifting since 26 < 2^5. So a 32-bit int could pack the entire string with zero hash collisions.

[EDIT]

OK, now that you've clarified the input.

[0-9], [A-Z] + ['-'] is 37 characters... just over the 32 character limit for 5 bits. So, you'll need 6 bits to store each char. Luckily, even with 5 chars, that's only 6*5 = 30 bits total, so you can completely pack the entire string into a 32 bit int, and won't even need to fiddle the sign bit, so you'll have a very fast, completely lossless (zero collision) hash code that uniquely identifies each string... that could even be reversed! All numbers will be positive, so you could assign some other meaning to the sign bit if desired.

Below is your new function to pack 5 characters in the range [0-9], [a-z] + ['-'] into an int. You will first need to take the Hex string and dehex it back to its normal state, and then run it through the bit twiddling below.

string c; //This is your text to hash.

public override int GetHashCode()
{
    if (c.Length > 5) throw new InvalidOperationException("Sliding window string too long");

    const char dash = (char)45;
    //Pack all characters right-aligned within the int
    return (c[0]-dash << 24) | (c[1]-dash << 18) | (c[2]-dash << 12) | (c[3]-dash << 6) | c[4]-dash;
}

Note: You can also safely use a few other characters between ASCII 45 and ASCII 109 like: "./:;<=>?[\]^_" without modifying this code.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜