Case-insensitive Bob Jenkins Hash?
Is there a case-insensitive variant of the Bob Jenkins hash function?
Generics.Defaults.BobJenkinsHash
provides a fast hash function. Unfortunately it cannot be used in combination with a case-insensitive compare function like so
TCustomStringComparer = class (TEqualityComparer <String>)
function Equals(const Left, Right: String): Boolean; override;
function GetHashCode(const Value: String): Integer; override;
end;
function TCustomStringComparer.Equals (const Left, Right : String) : Boolean;
begin
Result := CompareText (Left, Right) = 0;
end;
function TCustomStringComparer.GetHashCode (const Value : Str开发者_如何学Going) : Integer;
begin
Result := Generics.Defaults.BobJenkinsHash (Value [1], Length (Value) * SizeOf (Char), 0);
end;
This is because TDictionary first compares the hash codes and then uses the provided comparer when checking for equality.
Of course I could use UpperCase in my GetHashCode
function, but I wondered if it would be faster if I could somehow modify the hash function itself.
No, there is no case invariant version of the hash function. Lower or upper case all strings before passing them to the hash function.
It would be slightly faster, but it hurts your maintainability a lot. There is rarely a good reason for this type of micro-optimization. Just convert your strings to lower or upper case before hashing like you suggested.
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified" - Donald Knuth
IMO the whole question is wrong. To quote the Wikipedia article on hash functions:
A hash function is any well-defined procedure or mathematical function which converts a large, possibly variable-sized amount of data into a small datum, usually a single integer that may serve as an index to an array.
Note the "amount of data" - there is no type specified, and indeed the Bob Jenkins hash function has an untyped parameter const Data
pointing to the data to be hashed. Since the input data is not necessarily a sequence of characters there is no way to compute a "case-insensitive" hash value. And even if it were a sequence of characters - the upper- or lowercasing would depend on character set and encoding. So you would need to have different hash functions for ASCII strings, UTF-8 encoded strings, UTF-16 LE encoded strings, ... (you get the idea).
I also needed such a function in a project. Bob Jenkin's one-at-a-time hash:
function hash(const s: string): cardinal;
var
p, last: PByte;
begin
if s = '' then exit(1);
p := pbyte(pointer(s));
last := p + length(s);
result := 0;
while p < last do begin
if {$ifdef asciionly}p^ < 128{$else}true{$endif} then begin
result := result + p^;
if (p^ >= ord('a')) and (p^ <= ord('z')) then result := result - ord('a') + ord('A');
result := result + (result shl 10);
result := result xor (result shr 6);
end;
inc(p);
end;
result := result + (result shl 3);
result := result xor (result shr 11);
result := result + (result shl 15);
end;
If asciionly is set, it should also give the same hash for utf-8 and latin1 strings.
Do not forget to disable overflow checking.
精彩评论