开发者

Duplicate elements in java.util.Set

java.util.S开发者_运维问答et implementations removes the duplicate elements.

How are duplicates elements deleted internally in a java.util.Set?


Actually AFAIK from the sources most Set implementations in java don't even check if the element is already contained.

They just always execute the add() on their internal structure which holds the set elements and let that object handle the duplication case.

e.g. HashSet calls put(K,V) on the internal HashMap which just inserts the new object overwriting the old entry if duplicate.


Reading a little into your question I'm guessing that you're seeing strange behaviour with a java.util.HashSet (typically what everyone uses by default).

Contary to the contract of java.util.Set it is possible to get the same object in a java.util.HashSet twice like this:

import java.util.HashSet;
import java.util.Set;

public class SetTest 
{
  public static void main(String[] args) 
  {
    MyClass myObject = new MyClass(1, "testing 1 2 3");

    Set<MyClass> set = new HashSet<MyClass>();
    set.add(myObject);

    myObject.setHashCode(2);
    set.add(myObject);

    System.out.println(set.size());  // this will print 2.
  }

  private static class MyClass 
  {
    private int hashCode;
    private String otherField;

    public MyClass(int hashCode, String otherField) 
    {    
      this.hashCode = hashCode;
      this.otherField = otherField;
    }

    public void setHashCode(int hashCode) 
    {
      this.hashCode = hashCode;
    }

    public boolean equals(Object obj) 
    {    
      return obj != null && obj.getClass().equals(getClass()) && ((MyClass)obj).otherField.equals(otherField);
    }

    public int hashCode() 
    {
      return hashCode;
    }
  }
}

After the pointer from @jitter and a look at the source you can see why this would happen.

Like @jitter says, the java.util.HashSet uses a java.util.HashMap internally. When the hash changes between the first and second add a different bucket is used in the java.util.HashMap and the object is in the set twice.

The code sample may look a little contrieved but I've seen this happen in the wild with domain classes where the hash is created from mutable fields and the equals method hasn't been kept in sync with those fields.


An easy way to find this out is to look in the source for the code you are interested in.

Each JDK has a src.zip included which contains the source code for the public classes so you can just locate the source for HashSet and have a look :) I often use Eclipse for this. Start it, create a new Java project, set the JVM to be an installed JDK (if not you are using the system default JRE which doesn't have src.zip), and Ctrl-Shift-T to go to HashSet.


Read your question more detailed:

You can't add duplicates, from java doc for Set.add() or do you mean addAll?:

Adds the specified element to this set if it is not already present (optional operation). More formally, adds the specified element e to this set if the set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false. In combination with the restriction on constructors, this ensures that sets never contain duplicate elements.


Adds the specified element to the set if it is not already present. If the set already contains the element, the call leaves the set unchanged and returns false.In combination with the restriction on constructors, this ensures that sets never contain duplicate elements.


First off, set doesn't "Delete" duplicates, it doesn't allow entering duplicates in the first place.

Let me walk you through the implementation of set.add(e) method.

set.add(e) returns boolean stating whether e has been added in the set or not.

Let's take this simple code for example:

Duplicate elements in java.util.Set

We will get x as true and y as false.

Let us see what add() actually does:

Duplicate elements in java.util.Set

So, HashSet basically uses HashMap internally, and sends the element as key (and an empty initialized object called PRESENT as the value.). This map.put(k,v) either returns a null, if the key never existed, or it would return the old value which the key had.

Therefore while doing set.add(1) for the first time, we get null in response of map.put(1,PRESENT), and that's why we get true.

And when we call it the second time we don't get null in response to map.put(1,PRESENT) and hence the set.add(1) returns false.

(You can dig deeper into the put method, which internally calls putVal and uses hash to identify if a key is already existing, depending on which it returns a null or old Value.)

And since we are using HashMap internally, which uses hash to find uniqueness of a key, we would never end up having same element twice in a HashSet.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜