开发者

Hash functions and how they work

So I have two different field types, a char* of length n and an int.开发者_开发问答 I want to generate a hashvalue using both as keys. I add the last 16 bits of the int variable, we'll call the sum integer x, then I use collate: hash to generate a hashvalue for the char*, we'll call it integer y. I then add x+y together, then use hash with the sum to generate a hash value. Lets say i want to limit the hashvalues to a range of [1,4]. Can i just hashvalue%4 to get what i want? Also if there is a better way of generating a hashvalue from the two key let me know.


For the range [1,4] you will have to add 1 to hashvalue%4. However, a hash of 4 is a very small hash. That will give you a lot of collisions, limiting the effectiveness of the hash (that is, many different values of the fields will give you the same hash value.)

I recommend that you add more size (bits) to the hash, maybe 64K (16 bit hash). That will give you less collisions. Also, why not using std::unordered_map, that already implements a hash table?

Finally, as per the hashing function, it depends on the meaning of each of the fields. For example, if in your implementation, only the low 16 bits of the integers count, then the hash should be based only on those bits. There are general hashing functions for strings and for integers, so you could use any of them. Finally, for combining hash values for both fields, summing (or xor-ing) them is a common approach. Just ensure that the generated hash values are as much equally spread over the range as possible.


So, what you describe in many words is written:

struct noname {
  int ifield;
  char[N] cfield;
};

int hash(const noname &n) {
  int x = n.ifield;
  int y = ???(n.cfield);
  return x + y;
  // return (x + y) & 3;
}

Whether this hash function is good depends on the data. For example, if the ifield is always a multiple of 4, it is clearly bad. If the values of the fields are roughly evenly distributed, everything is fine.

Well, except for your requirement to limit the hash range to [1;4]. First, [0;3] is easier to compute, second, such a small range would be appropriate if you only have two or three different things that will have their hash code generated. The range should be at least twice as large as the number of expected different elements.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜