More elegant way to check for duplicates in C++ array?

2023-01-21 06:04 问答作者：

I wrote this code in C++ as part of a uni task where I need to ensure that there are no duplicates within an array:

// Check for duplicate numbers in user inputted data
    int i; // Need to declare i here so that it can be accessed by the 'inner' loo开发者_如何学Pythonp that starts on line 21
    for(i = 0;i < 6; i++) { // Check each other number in the array
        for(int j = i; j < 6; j++) { // Check the rest of the numbers
            if(j != i) { // Makes sure don't check number against itself
                if(userNumbers[i] == userNumbers[j]) {
                    b = true;
                }
            }
            if(b == true) { // If there is a duplicate, change that particular number
                cout << "Please re-enter number " << i + 1 << ". Duplicate numbers are not allowed:" << endl;
                cin >> userNumbers[i];
            }
        } // Comparison loop
        b = false; // Reset the boolean after each number entered has been checked
    } // Main check loop

It works perfectly, but I'd like to know if there is a more elegant or efficient way to check.

You could sort the array in O(nlog(n)), then simply look until the next number. That is substantially faster than your O(n^2) existing algorithm. The code is also a lot cleaner. Your code also doesn't ensure no duplicates were inserted when they were re-entered. You need to prevent duplicates from existing in the first place.

std::sort(userNumbers.begin(), userNumbers.end());
for(int i = 0; i < userNumbers.size() - 1; i++) {
    if (userNumbers[i] == userNumbers[i + 1]) {
        userNumbers.erase(userNumbers.begin() + i);
        i--;
    }
}

I also second the reccomendation to use a std::set - no duplicates there.

The following solution is based on sorting the numbers and then removing the duplicates:

#include <algorithm>

int main()
{
    int userNumbers[6];

    // ...

    int* end = userNumbers + 6;
    std::sort(userNumbers, end);
    bool containsDuplicates = (std::unique(userNumbers, end) != end);
}

Indeed, the fastest and as far I can see most elegant method is as advised above:

std::vector<int> tUserNumbers;
// ...
std::set<int> tSet(tUserNumbers.begin(), tUserNumbers.end());
std::vector<int>(tSet.begin(), tSet.end()).swap(tUserNumbers);

It is O(n log n). This however does not make it, if the ordering of the numbers in the input array needs to be kept... In this case I did:

    std::set<int> tTmp;
    std::vector<int>::iterator tNewEnd = 
        std::remove_if(tUserNumbers.begin(), tUserNumbers.end(), 
        [&tTmp] (int pNumber) -> bool {
            return (!tTmp.insert(pNumber).second);
    });
    tUserNumbers.erase(tNewEnd, tUserNumbers.end());

which is still O(n log n) and keeps the original ordering of elements in tUserNumbers.

Cheers,

Paul

It is in extension to the answer by @Puppy, which is the current best answer.

PS : I tried to insert this post as comment in the current best answer by @Puppy but couldn't so as I don't have 50 points yet. Also a bit of experimental data is shared here for further help.

Both std::set and std::map are implemented in STL using Balanced Binary Search tree only. So both will lead to a complexity of O(nlogn) only in this case. While the better performance can be achieved if a hash table is used. std::unordered_map offers hash table based implementation for faster search. I experimented with all three implementations and found the results using std::unordered_map to be better than std::set and std::map. Results and code are shared below. Images are the snapshot of performance measured by LeetCode on the solutions.



bool hasDuplicate(vector<int>& nums) {
    size_t count = nums.size();
    if (!count)
        return false;
    std::unordered_map<int, int> tbl;
    //std::set<int> tbl;
    for (size_t i = 0; i < count; i++) {
        if (tbl.find(nums[i]) != tbl.end())
            return true;
        tbl[nums[i]] = 1;
        //tbl.insert(nums[i]);
    }
    return false;
}

unordered_map Performance (Run time was 52 ms here)

More elegant way to check for duplicates in C++ array?

Set/Map Performance

More elegant way to check for duplicates in C++ array?

You can add all elements in a set and check when adding if it is already present or not. That would be more elegant and efficient.

I'm not sure why this hasn't been suggested but here is a way in base 10 to find duplicates in O(n).. The problem I see with the already suggested O(n) solution is that it requires that the digits be sorted first.. This method is O(n) and does not require the set to be sorted. The cool thing is that checking if a specific digit has duplicates is O(1). I know this thread is probably dead but maybe it will help somebody! :)

/*
============================
Foo
============================
* 
   Takes in a read only unsigned int. A table is created to store counters 
   for each digit. If any digit's counter is flipped higher than 1, function
   returns. For example, with 48778584:
    0   1   2   3   4   5   6   7   8   9
   [0] [0] [0] [0] [2] [1] [0] [2] [2] [0]

   When we iterate over this array, we find that 4 is duplicated and immediately
   return false.

*/
bool Foo(int number)
{
    int temp = number;
    int digitTable[10]={0};

    while(temp > 0)
    {
        digitTable[temp % 10]++; // Last digit's respective index.
        temp /= 10; // Move to next digit
    }

    for (int i=0; i < 10; i++)
    {
        if (digitTable [i] > 1)
        {
            return false;
        }
    }
    return true;
}

It's ok, specially for small array lengths. I'd use more efficient aproaches (less than n^2/2 comparisons) if the array is mugh bigger - see DeadMG's answer.

Some small corrections for your code:

Instead of int j = i writeint j = i +1 and you can omit your if(j != i) test
You should't need to declare i variable outside the for statement.

I think @Michael Jaison G's solution is really brilliant, I modify his code a little to avoid sorting. (By using unordered_set, the algorithm may faster a little.)

template <class Iterator>
bool isDuplicated(Iterator begin, Iterator end) {
    using T = typename std::iterator_traits<Iterator>::value_type;
    std::unordered_set<T> values(begin, end);
    std::size_t size = std::distance(begin,end);
    return size != values.size();
}

//std::unique(_copy) requires a sorted container.
std::sort(cont.begin(), cont.end());

//testing if cont has duplicates
std::unique(cont.begin(), cont.end()) != cont.end();

//getting a new container with no duplicates
std::unique_copy(cont.begin(), cont.end(), std::back_inserter(cont2));

#include<iostream>
#include<algorithm>

int main(){

    int arr[] = {3, 2, 3, 4, 1, 5, 5, 5};
    int len = sizeof(arr) / sizeof(*arr); // Finding length of array

    std::sort(arr, arr+len);

    int unique_elements = std::unique(arr, arr+len) - arr;

    if(unique_elements == len) std::cout << "Duplicate number is not present here\n";
    else std::cout << "Duplicate number present in this array\n";

    return 0;
}

As mentioned by @underscore_d, an elegant and efficient solution would be,

#include <algorithm>
#include <vector>

template <class Iterator>
bool has_duplicates(Iterator begin, Iterator end) {
    using T = typename std::iterator_traits<Iterator>::value_type;
    std::vector<T> values(begin, end);

    std::sort(values.begin(), values.end());
    return (std::adjacent_find(values.begin(), values.end()) != values.end());
}

int main() {
    int user_ids[6];
    // ...
    std::cout << has_duplicates(user_ids, user_ids + 6) << std::endl;
}

fast O(N) time and space solution return first when it hits duplicate

template <typename T>
bool containsDuplicate(vector<T>& items) {
    return any_of(items.begin(), items.end(), [s = unordered_set<T>{}](const auto& item) mutable {
        return !s.insert(item).second;
    });
}

Not enough karma to post a comment. Hence a post.

   vector <int> numArray = { 1,2,1,4,5 };
   unordered_map<int, bool> hasDuplicate;
   bool flag = false;
   for (auto i : numArray)
   {
      if (hasDuplicate[i])
      {
         flag = true;
         break;
      }
      else
         hasDuplicate[i] = true;
   }

   (flag)?(cout << "Duplicate"):("No duplicate");

继续阅读：arrays duplicates

More elegant way to check for duplicates in C++ array?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？