Java: Test for duplicate Objects in a Collection
Given a List
of MyClass
objects (and a custom Comparitor myComparitor
if needed), what good options are there for checking if the List
contains two "equal" objects?
Edit: if there are duplicates, return a reference to 开发者_运维问答one or more of the duplicates.
Overriding MyClass.equals(MyClass)
in this case is not an option.
My initial thought is to create a hash table of sorts, but I suspect that there's a non-hack way to accomplish the same thing:
SortedSet mySet = new TreeSet(myComparitor);
mySet.addAll(myList);
// Find duplicates in a sorted set in O(N) time
P.S. Is there a good reference on Markdown?
If the element's equals(Object)
method does not give you the semantic that you require, then HashMap
or HashSet
are not options. Your choices are:
- Use a
TreeMap
for de-duping. This isO(NlogN)
. - Sort the
ArrayList
or a copy, then iterate over looking for element i equals element i + 1. This isO(NlogN)
. - Find an alternative implementation of hash sets that allows you to provide a separate object to implement equality and hashing. (Neither Apache or Google collections support this, so you'll need to look further afield.)
- Create a wrapper class for your element type that overrides
equals(Object)
andhashCode()
, and de-dup using aHashSet
of wrapped objects. This isO(N)
, but the constant of proportionality will be larger than a simpleHashSet
due to creation of wrapper objects.
When de-duping with a Set
it is probably better to use a loop rather than addAll
. This is necessary if you need to know what all of the duplicates are. If you don't need to know that, then using a loop allows you to stop when you find the first duplicate. The only case where addAll
is likely to perform better is when there are likely to be no duplicates.
if you already have a sorted list, you could just look at any element and the next element, and if they're the same you have dups.
in your question you are using a TreeSet, which would have culled out duplicates already, so if you just need to know if you have duplicates, just check the size of mySet vs the size of myList. if they aren't the same you have dups.
精彩评论