How do I hash a 2-d array efficiently (to be stored in a HashSet)?
I've written a class called PuzzleBoard that represe开发者_如何学Pythonnts an nxn board. I will be keeping several PuzzleBoard objects in a HashSet, so I have to overwrite the 'int hashCode()' method.
Below are the fields of my class:
private int N;
private int[][] puzzle;
private int blankCellX;
private int blankCellY;
private int cost;
What Eclipse automatically generated for me was:
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + N;
result = prime * result + blankCellX;
result = prime * result + blankCellY;
result = prime * result + cost;
result = prime * result + Arrays.hashCode(puzzle);
return result;
}
Thinking that this method doesn't take into account the contents of the 2-d array, I changed it into this:
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + N;
result = prime * result + blankCellX;
result = prime * result + blankCellY;
result = prime * result + cost;
for (int i = 0; i < N; ++i)
result = prime * result + Arrays.hashCode(puzzle[i]);
return result;
}
However, the problem with this method is that it takes too long to complete: O(N^2) Furthermore; the 'result' variable is very likely to overflow.
Now, my question is, how do I write an efficient hash method that doesn't take too long to complete. Moreover; inserting or searching an object in the HashSet should be efficient (near constant time).
In the worst case, N will be 10 and the HashSet will contain ~1000 PuzzleBoards.
Why am I doing all this? I'm implementing a solution for the N-Puzzle problem by using the A* algorithm. So in some phase of the algorithm, given the current node (configuration of the board), I'm moving the blank cell up, down, right or left to generate new child nodes. Because of this, puzzle configurations differ usually by 1 or 2 cells. I'm storing all the explored nodes in a HashSet.
Thanks in advance =)
Hash codes do not need to be unique, it's just better if they are. Since you have a relatively small number of items in the HashSet (~1000) you can choose a small amount of suitable data to hash together. For example, maybe you only need the first row of the 'puzzle' table, or maybe the 'cost' variable is sufficiently different for different instances that you can use it as a good source of difference.
It doesn't matter if the result overflows: all you want is for different objects to return different hash codes if possible. The actual value of the hash is not important.
this method doesn't take into account the contents of the 2-d array
You could also use util.Arrays#deepHashCode()
.
However, the problem with this method is that it takes too long to complete: O(N^2)
You can't go faster if you want to hash all of the N^2 ints in it? If N is at most 10, what's with the Big-O notation anyway? O(n^2)
does not mean slow. I don't think your hashCode method is inefficient. The inefficiency or some O(n^2)
is most likely somewhere else... Still if this method is called often (and PuzzleBoard is immutable) you might want to cache the hashCode value.
the 'result' variable is very likely to overflow.
No problem! Overflows are defined in Java.
Moreover; inserting or searching an object in the HashSet should be efficient (near constant time).
Inserting is most likely only amortized constant time. When the HashSet gets full, a new bigger HashSet will be made. All elements are copied in it, all the hashCodes will have to be calculated again. Try setting an initialCapacity for the HashSet?
result = prime * result + cost;
Are you sure you want the cost (I assume it's the depth) to be included in equals and hashCode? Two configurations are the same no matter how many steps it took me to get there, right?
~1000 PuzzleBoards
If I remember correctly, last time I solved this puzzle I had a lot more than 1000 configurations.
精彩评论