开发者

Removing duplicates without overriding hash method

I have a List which contains a list of objects and I want to remove from this list all the elements which have the same values in two of their attributes. I had though about doing somethi开发者_JAVA百科ng like this:

List<Class1> myList;
....
Set<Class1> mySet = new HashSet<Class1>();
mySet.addAll(myList);

and overriding hash method in Class1 so it returns a number which depends only in the attributes I want to consider.

The problem is that I need to do a different filtering in another part of the application so I can't override hash method in this way (I would need two different hash methods).

What's the most efficient way of doing this filtering without overriding hash method?

Thanks


Overriding hashCode and equals in Class1 (just to do this) is problematic. You end up with your class having an unnatural definition of equality, which may turn out to be other for other current and future uses of the class.

Review the Comparator interface and write a Comparator<Class1> implementation to compare instances of your Class1 based on your criteria; e.g. based on those two attributes. Then instantiate a TreeSet<Class>` for duplicate detection using the TreeSet(Comparator) constructor.

EDIT

Comparing this approach with @Tom Hawtin's approach:

  • The two approaches use roughly comparable space overall. The treeset's internal nodes roughly balance the hashset's array and the wrappers that support the custom equals / hash methods.

  • The wrapper + hashset approach is O(N) in time (assuming good hashing) versus O(NlogN) for the treeset approach. So that is the way to go if the input list is likely to be large.

  • The treeset approach wins in terms of the lines of code that need to be written.


Let your Class1 implements Comparable. Then use TreeSet as in your example (i.e. use addAll method).


As an alternative to what Roman said you can have a look at this SO question about filtering using Predicates. If you use Google Collections anyway this might be a good fit.


I would suggest introducing a class for the concept of the parts of Class1 that you want to consider significant in this context. Then use a HashSet or HashMap.


Sometimes programmers make things too complicated trying to use all the nice features of a language, and the answers to this question are an example. Overriding anything on the class is overkill. What you need is this:

class MyClass {
  Object attr1;
  Object attr2;
}

List<Class1> list;
Set<Class1> set=....
Set<MyClass> tempset = new HashSet<MyClass>;

for (Class1 c:list) {
  MyClass myc = new MyClass();
  myc.attr1 = c.attr1;
  myc.attr2 = c.attr2;

  if (!tempset.contains(myc)) {
    tempset.add(myc);
    set.add(c);
  }
}

Feel free to fix up minor irregulairites. There will be some issues depending on what you mean by equality for the attributes (and obvious changes if the attributes are primitive). Sometimes we need to write code, not just use the builtin libraries.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜