Java Collections API HashSet remove method
I encountered this issue while working with the Java Collections API. Basically this is a support method for an implementation of Kruskal's algorithm for finding an MST. I created this class for implementing the union/find algorithm.
My question, as I was able to find a work around, is that does anybody know of any reason why the remove method in the "union" method would not work consistently. That is at run time it would remove some elements and not others. For example I implemented this for a task involving cities and it seemed to not like removing some cities. In particular it repeatedly stumbled on a couple of different sets, but always the same ones. I wondered whethe开发者_运维知识库r it was a object reference issue, i.e. whether I was testing the wrong thing, but I could not get around it.
I know the rest of my work was correct as I was able to replace it with a loop that eliminated the element, and the algorithm executed perfectly. Probably with slightly worse performance, however.
I was wondering whether anybody can see a mistake. Also I should note that I called it from different class, however, the calls were made with elements that were retrieved using the find method. Note that the find method must work well, since simply altering the remove method made the whole thing work, i.e. it was finding and returning the appropriate objects.
Thanks
Oscar
/*
* A constructor for creating a new object of this class.
*/
DisjointSets()
{
underlying = new HashSet<HashSet<String>>();
}
/*
* A method for adding a set to this DisjointSets object
*/
void add(HashSet<String> h)
{
underlying.add(h);
}
/*
* A method for finding an element in this DisjointSet object.
*/
HashSet<String> find(String s)
{
// Check each set in the DisjointSets object
for(HashSet<String> h: underlying)
{
if(h.contains(s))
{
return h;
}
}
return null;
}
/*
* A method for combining to subsets of the DisjointSets
*/
void union(HashSet<String> h1, HashSet<String> h2)
{
System.out.print("CHECK ON DS\n");
System.out.print("*********************\n");
System.out.print("H1 is : { ");
for (HashSet<String> n: underlying)
{
System.out.print("Set is : { ");
for (String h : n)
{
System.out.print(h + " , ");
}
System.out.print("} \n ");
}
// Add the objects of h1 to h2
// DOES NOT WORK CONSISTENTLY
h1.addAll(h2);
underlying.remove(h2);
}
}
And I replaced it with
HashSet<HashSet<String>> temp = new HashSet<HashSet<String>>();
for(HashSet<String> f: underlying)
{
if(f != h2)
{
temp.add(f);
}
}
underlying = temp;
The problem is that when you modify the contents of one of the nested HashSets, you screw up the internals of the outer HashSet (because the hashCode() of the nested HashSet has changed). in order to maintain this collection correctly, whenever you want to modify one of the nested HashSets you must first remove it from the outer HashSet and then re-add it (if necessary).
(you don't really provide enough code to figure out if that's truly the problem, but that's my best guess).
Set<Set<String>> outerSet = new HashSet<String>();
Set<String> innerSet = new HashSet<String>();
innerSet.add("foo");
outerSet.add(innerSet);
// *** BROKEN ***
innerSet.add("bar"); // <- adding element to innerSet changes result of innerSet.hashCode()
outerSet.remove(innerSet); // <- this may or may not work because outerSet is _broken_
// *** BROKEN ***
// *** CORRECT ***
outerSet.remove(innerSet);
innerSet.add("bar");
// now you can put innerSet back in outerSet if necessary
Following up on @jtahlborn's answer, the contract for AbstractSet.hashCode()
says
Returns the hash code value for this set. The hash code of a set is defined to be the sum of the hash codes of the elements in the set. This ensures that s1.equals(s2) implies that s1.hashCode()==s2.hashCode() for any two sets s1 and s2, as required by the general contract of Object.hashCode.
This implementation enumerates over the set, calling the hashCode method on each element in the collection, and adding up the results.
Code to demonstrate @jtahlborn's answer (which is correct)
import java.util.HashSet;
import java.util.Set;
public class TestHashSetHashCode {
public static void main(String[] args)
{
Set<String> strings = new HashSet<String>();
strings.add("one");
strings.add("two");
strings.add("three");
strings.add("four");
strings.add("five");
Set<String> test = new HashSet<String>();
System.out.println("Code "+test.hashCode());
for (String s : strings) {
test.add(s);
System.out.println("Code "+test.hashCode());
}
}
}
Outputs
Code 0 Code 115276 Code 3258622 Code 3368804 Code 113708290 Code 116857384
One more reason to add to the list to make use of immutable collections wherever possible.
精彩评论