Checking for duplicates in a vector [duplicate]

2022-12-30 23:30 问答作者：

This question already has answers here: Closed 12 years ago.

Possible Duplicate:
Determining if an unordered vector<T> has all unique elements

I have to check a vector for duplicates. What is the best way to approach this:

I take the first element, compar开发者_StackOverflow中文版e it against all other elements in the vector. Then take the next element and do the same and so on.

Is this the best way to do it, or is there a more efficient way to check for dups?

If your vector is an STL container, the solution is easy:

std::sort(myvec.begin(), myvec.end());
std::erase(std::unique(myvec.begin(), myvec.end()), myvec.end());

According to cppreference (https://en.cppreference.com/w/cpp/algorithm/unique), the elements are shifted around so that the values from myvec.begin() to the return value of std::unique are all unique. The elements after the iterator returned by std::unique are unspecified (useless in every use-case I've seen) so remove them from the std::vector<A> using std::vector<A>::erase.

Use a hash table in which you insert each element. Before you insert an element, check if it's already there. If it is, you have yourself a duplicate. This is O(n) on average, but the worst case is just as bad as your current method.

Alternatively, you can use a set to do the same thing in O(n log n) worst case. This is as good as the sorting solution, except it doesn't change the order of the elements (uses more memory though since you create a set).

Another way is to copy your vector to another vector, sort that and check the adjacent elements there. I'm not sure if this is faster than the set solution, but I think sorting adds less overhead than the balanced search trees a set uses so it should be faster in practice.

Of course, if you don't care about keeping the original order of the elements, just sort the initial vector.

If you don't care about an occasional false positive, you can use a Bloom Filter to detect probable duplicates in the collection. If false positives can't be accepted, take the values that fail the filter and run a second detection pass on those. The list of failed values should be fairly small, although they will need to be checked against the full input.

Sorting and then comparing adjacent elements is the way to go. A sort takes O(n log n) comparisons, an then an additional n-1 to compare adjacent elements.

The scheme in the question would take (n^2)/2 comparisons.

You can also use binary_search.

Here are two good examples that will help you:

http://www.cplusplus.com/reference/algorithm/binary_search/

http://www.cplusplus.com/reference/algorithm/unique_copy/

Checking for duplicates in a vector [duplicate]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？