Case-insensitive Bob Jenkins Hash?

2022-12-09 17:49 问答作者：

Is there a case-insensitive variant of the Bob Jenkins hash function?

Generics.Defaults.BobJenkinsHash

provides a fast hash function. Unfortunately it cannot be used in combination with a case-insensitive compare function like so

TCustomStringComparer = class (TEqualityComparer <String>)
  function Equals(const Left, Right: String): Boolean; override;
  function GetHashCode(const Value: String): Integer; override;
end;
function TCustomStringComparer.Equals (const Left, Right : String) : Boolean;
begin
  Result := CompareText (Left, Right) = 0;
end;
function TCustomStringComparer.GetHashCode (const Value : Str开发者_如何学Going) : Integer;
begin
  Result := Generics.Defaults.BobJenkinsHash (Value [1], Length (Value) * SizeOf (Char), 0);
end;

This is because TDictionary first compares the hash codes and then uses the provided comparer when checking for equality.

Of course I could use UpperCase in my GetHashCode function, but I wondered if it would be faster if I could somehow modify the hash function itself.

No, there is no case invariant version of the hash function. Lower or upper case all strings before passing them to the hash function.

It would be slightly faster, but it hurts your maintainability a lot. There is rarely a good reason for this type of micro-optimization. Just convert your strings to lower or upper case before hashing like you suggested.

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified" - Donald Knuth

IMO the whole question is wrong. To quote the Wikipedia article on hash functions:

A hash function is any well-defined procedure or mathematical function which converts a large, possibly variable-sized amount of data into a small datum, usually a single integer that may serve as an index to an array.

Note the "amount of data" - there is no type specified, and indeed the Bob Jenkins hash function has an untyped parameter const Data pointing to the data to be hashed. Since the input data is not necessarily a sequence of characters there is no way to compute a "case-insensitive" hash value. And even if it were a sequence of characters - the upper- or lowercasing would depend on character set and encoding. So you would need to have different hash functions for ASCII strings, UTF-8 encoded strings, UTF-16 LE encoded strings, ... (you get the idea).

I also needed such a function in a project. Bob Jenkin's one-at-a-time hash:

function hash(const s: string): cardinal;
var
  p, last: PByte;
begin
  if s = '' then exit(1);
  p := pbyte(pointer(s));
  last := p + length(s);
  result := 0;
  while p < last do begin
    if {$ifdef asciionly}p^ < 128{$else}true{$endif}  then begin
      result := result + p^;
      if (p^ >= ord('a')) and (p^ <= ord('z')) then result := result - ord('a') + ord('A');
      result := result + (result shl 10);
      result := result xor (result shr 6);
    end;
    inc(p);
  end;

  result := result + (result shl 3);
  result := result xor (result shr 11);
  result := result + (result shl 15);
end;

If asciionly is set, it should also give the same hash for utf-8 and latin1 strings.

Do not forget to disable overflow checking.

继续阅读：delphi delphi-2010 hash

Case-insensitive Bob Jenkins Hash?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？