开发者

Java HashSet using a specified method

I have a basic class 'HistoryItem' like so:

public class HistoryItem
  private Date startDate;
  private Date endDate;
  private Info info;
  private String details;

  @Override
  public int hashCode() {
    int hash = (startDate == null ? 0 : startDate.hashCode());
    hash = hash * 31 + (endDate == null ? 0 : endDate.hashCode());
    return hash;
  }
}

I am currently using a HashSet to remove duplicates from an ArrayList on the startDate & endDate fields开发者_JAVA技巧, which is working correctly.

However I also need to remove duplicates on different fields (info & details).

My question is this.

Is there a way to specify a different method which HashSet will use in place of hashCode()? Something like this:

public int hashCode_2() {
  int hash = (info == null ? 0 : info.hashCode());
  hash = hash * 31 + (details == null ? 0 : details.hashCode());
  return hash;
}

Set<HistoryItem> removeDups = new HashSet<HistoryItem>();
removeDups.setHashMethod(hashCode_2);

Or is there another way that I should be doing this?


You can make a wrapper class around HistoryItem with a different GetHashCode implementation, then make a HashSet of wrappers around each item in the original set.


A couple things. First and foremost, you MUST override equals() if you are going to override hashCode(). This is important. Second, if you are dealing with different fields, then you should probably have a different HashSet for each field. So you can iterate over the Map like so:

HashSet<String> info;
HashSet<String> details;
for (HistoryItem h:map){
  if(info.contains(h.getInfo()){
    // this is a dup

  }
  if (details.contains(h.getDetails()){
    // this is a dup
  }
  info.add(h.getInfo());
  details.add(h.getDetails());
}


I ended up using GNU Trove for this.

Minimal code change was required.

A new class implementing TObjectHashingStrategy (containing HashCode and Equals methods).

public class HistoryItemDuplicateInfo
implements TObjectHashingStrategy<HistoryItem> {

  @Override
  public int computeHashCode(HistoryItem obj) {
     ...
  }

  @Override
  public boolean equals(HistoryItem arg0, HistoryItem arg1) {
    ...
  }
}

Then use the THashSet object with a specified strategy for removing the duplicates.

THashSet<HistoryItem> hs = new THashSet<HistoryItem>(new HistoryItemDuplicateInfo());

Hope this is able to help someone out in future.


You could remove the duplicates using a java.util.TreeSet with a custom Comparator that takes your Info and Details into account.


I would suggest you;

  • use long for a date instead of a Date object.
  • use just a Set if you want to avoid duplicates. Why are you using a List at all? If you need to retain a order using a SortedSet like TreeSet or a Set which retains order like LinkedHashSet.
  • Can your HistoryItem be valid will null fields? Can you structure your fields so they are never null?
  • Fields which make up hashCode/equals/compareTo should be immutable. Can those fields be final? If not, why not?


HashSet is hardcoded to use hashCode() and equals(). You could implement your own HashSet-like class, possibly by ruthlessly duplicating Java's own source code, but that's plain ugly, contradicts any decent set of software development rules, and is possibly illegal with regards to Java's source code license (this depends on the actual JDK, e.g. Sun/Oracle's JDK vs OpenJDK).

You can do things with TreeSet, though. TreeSet normally uses the compareTo() method of the elements, not the hashCode() or equals(). Moreover, a TreeSet instance can be built with a custom Comparator instance, which is then invoked to do comparisons, making you free to have your own rules. A compareTo() method (or a Comparator.compare() method) must implement an order, which may be a bit trickier than a simple hashCode()-and-equals(), but this usually not hard either. TreeSet is sometimes said to be slower than HashSet, but the actual difference is slight and it takes a very specific situation to actually be able to notice that difference in any way.

Conceptually, there could be a hash equivalent of Comparator for HashSet: an interface HasherAndEqualizer with int hashCode(Object obj) and boolean equals(Object obj1, Object obj2) methods. Sun did not see it fit to include such an interface, I do not know why. Possibly they did not think it would be useful. The "GNU Trove" library that you cite in another answer provides such an interface.

Alternatively, you can always use wrappers. Instead of storing HistoryItem instances in your secondary set, you can store HistoryItemWrapper instances, each linking to an actual HistoryItem and providing the hashCode()/equals() methods you need for that set.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜