Hash functions and how they work

2023-04-05 16:35 问答作者：

So I have two different field types, a char* of length n and an int.开发者_开发问答 I want to generate a hashvalue using both as keys. I add the last 16 bits of the int variable, we'll call the sum integer x, then I use collate: hash to generate a hashvalue for the char*, we'll call it integer y. I then add x+y together, then use hash with the sum to generate a hash value. Lets say i want to limit the hashvalues to a range of [1,4]. Can i just hashvalue%4 to get what i want? Also if there is a better way of generating a hashvalue from the two key let me know.

For the range [1,4] you will have to add 1 to hashvalue%4. However, a hash of 4 is a very small hash. That will give you a lot of collisions, limiting the effectiveness of the hash (that is, many different values of the fields will give you the same hash value.)

I recommend that you add more size (bits) to the hash, maybe 64K (16 bit hash). That will give you less collisions. Also, why not using std::unordered_map, that already implements a hash table?

Finally, as per the hashing function, it depends on the meaning of each of the fields. For example, if in your implementation, only the low 16 bits of the integers count, then the hash should be based only on those bits. There are general hashing functions for strings and for integers, so you could use any of them. Finally, for combining hash values for both fields, summing (or xor-ing) them is a common approach. Just ensure that the generated hash values are as much equally spread over the range as possible.

So, what you describe in many words is written:

struct noname {
  int ifield;
  char[N] cfield;
};

int hash(const noname &n) {
  int x = n.ifield;
  int y = ???(n.cfield);
  return x + y;
  // return (x + y) & 3;
}

Whether this hash function is good depends on the data. For example, if the ifield is always a multiple of 4, it is clearly bad. If the values of the fields are roughly evenly distributed, everything is fine.

Well, except for your requirement to limit the hash range to [1;4]. First, [0;3] is easier to compute, second, such a small range would be appropriate if you only have two or three different things that will have their hash code generated. The range should be at least twice as large as the number of expected different elements.

继续阅读：hash

Hash functions and how they work

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？