Collision free hash function for a specific data structure
Is it possible to create collision free hash function for a data structure with specific properties.
- The datastructure is int[][][]
- It contains no duplicates
- The range of integers that are contained in it is defined. Let's say it's 0..1000, the maximal integer is definitely not greater than 10000.
Big problem is that this hash function should also be very fast. Is there a way to create such a hash function? Maybe at run time depending on the integer range?
ADDITION: I should say that the purpose of this hash function is to quc开发者_运维问答kily check if the particular combination was processed. So when some combination of numbers in the data structure is processed, I calculate the hash value and store it. Then when processing another combination of numbers within the data structure I will compare the hash values.
I think what you want is a "perfect hash" or even a "minimal perfect hash":
http://en.wikipedia.org/wiki/Perfect_hash_function
Edit: That said, if you're sure and certain you'll never go above [0...1000] and depending on what you need to do you probably can simply "bucket" your results directly in an array. If you don't have many elements, that array would be sparse (and hence a bit of a waste) but for at most 1001 elements going from [0...1000] an Object[1001] (or int[1001] or whatever) will probably do.
what if you just use a 64-bit value and store the location in each level of the hierarchy into one section of bits?
something like(off the top of my head): hash = (a << 34) | (b << 17) | (c)
A perfect hash is likely not feasible, because it can take a lot of computation time to find one for your data set.
Would a bool[][][]
work for you, where true
means a certain x,y,z combination has been processed? Below is a prototype for a three-dimensional bit array. Because of the limits of an Int32, this will only work up to a maximum index of about 1,024 (but would fit within 128 MB). You could get to 10,000 by creating a BitArray[][]. However, this is probably not practical at that size, because it would occupy over 116 GB of RAM.
Depending on your exact problem size and needs, a plain old hash table (with collisions) may be your best bet. That said, here is the prototype code:
public class ThreeDimensionalBitArray
{
// todo: consider making the size configurable
private const int MAX_INDEX = 1000;
private BitArray _bits = new BitArray(MAX_INDEX * MAX_INDEX * MAX_INDEX);
public bool this[int x, int y, int z]
{
get { return _bits[getBitIndex(x, y, z)]; }
set { _bits[getBitIndex(x, y, z)] = value; }
}
public ThreeDimensionalBitArray()
{
}
private static int getBitIndex(int x, int y, int z)
{
// todo: bounds check x, y, and z
return (x * MAX_INDEX * MAX_INDEX) + (y * MAX_INDEX) + z;
}
}
public class BitArrayExample
{
public static void Main()
{
ThreeDimensionalBitArray bitArray = new ThreeDimensionalBitArray();
Console.WriteLine(bitArray[500, 600, 700]); // "false"
bitArray[500, 600, 700] = true;
Console.WriteLine(bitArray[500, 600, 700]); // "true"
}
}
精彩评论