Checking for duplicates in a vector [duplicate]
Possible Duplicate:
Determining if an unordered vector<T> has all unique elements
I have to check a vector for duplicates. What is the best way to approach this:
I take the first element, compar开发者_StackOverflow中文版e it against all other elements in the vector. Then take the next element and do the same and so on.
Is this the best way to do it, or is there a more efficient way to check for dups?
If your vector is an STL container, the solution is easy:
std::sort(myvec.begin(), myvec.end());
std::erase(std::unique(myvec.begin(), myvec.end()), myvec.end());
According to cppreference (https://en.cppreference.com/w/cpp/algorithm/unique), the elements are shifted around so that the values from myvec.begin()
to the return value of std::unique
are all unique. The elements after the iterator returned by std::unique
are unspecified (useless in every use-case I've seen) so remove them from the std::vector<A>
using std::vector<A>::erase
.
Use a hash table in which you insert each element. Before you insert an element, check if it's already there. If it is, you have yourself a duplicate. This is O(n)
on average, but the worst case is just as bad as your current method.
Alternatively, you can use a set to do the same thing in O(n log n)
worst case. This is as good as the sorting solution, except it doesn't change the order of the elements (uses more memory though since you create a set).
Another way is to copy your vector to another vector, sort that and check the adjacent elements there. I'm not sure if this is faster than the set solution, but I think sorting adds less overhead than the balanced search trees a set uses so it should be faster in practice.
Of course, if you don't care about keeping the original order of the elements, just sort the initial vector.
If you don't care about an occasional false positive, you can use a Bloom Filter to detect probable duplicates in the collection. If false positives can't be accepted, take the values that fail the filter and run a second detection pass on those. The list of failed values should be fairly small, although they will need to be checked against the full input.
Sorting and then comparing adjacent elements is the way to go. A sort takes O(n log n) comparisons, an then an additional n-1 to compare adjacent elements.
The scheme in the question would take (n^2)/2 comparisons.
You can also use binary_search.
Here are two good examples that will help you:
http://www.cplusplus.com/reference/algorithm/binary_search/
http://www.cplusplus.com/reference/algorithm/unique_copy/
精彩评论