开发者

Find Duplicates in an array in O(N) time

Is there a way to find all the duplicate elements in an array of N elements in O(N) time?

Example:

Input: 11, 29, 81, 14, 43, 43, 81, 29

Output: 29, 81, 43

Sorting the input and doing a linear scan to detect duplicates destroys the order and开发者_如何学Python gives the output: 29,43,81.

Sorting-by-key another array of indices {0,1,...N-1} according to the given array to get {1,4,2} and then sorting the resultant set of indices to get {1,2,4} will give us {29,81,43}, but this takes O(N logN) time.

Is there an O(N) algorithm to solve this problem?

P.S. I forgot to add: I dont want to use hash tables. I am looking for a non-hash solution.


I believe a good solution (decent memory usage, can be used to immediately determine if an entry has already been seen thus preserving order, and with a linear complexity) is a trie.

If you insert the elements into the trie as if they were a string with each digit (starting from the MSD) in each node, you can pull this off with a complexity of O(m N) where m is the average length of numbers in base-10 digits.

You'd just loop over all your entries and insert them into the trie. Each time an element already exists, you skip it and move on to the next. Duplicates in this (unlike in my previous answer of a Radix Sort) will be found immediately instead of in the last iteration or what not.

I'm not sure if you would benefit from using a suffix tree here, as the "base" of the characters being entered into the trie is only 10 (compared to the base-128 for ANSI strings), but it's possible.


If your inputs are all small integers you can use a counting sort which runs in O(n) time and requires O(m) space where m is the size of the range of possible inputs.

As a space optimization it is enough to use a bit array and use a single bit (rather than a count) to store whether you have seen that item before or not.


It sounds like you're adverse to allocating any additional space. Nonetheless, a hash table is still the right solution for speed. Honestly, most hash table implementations for simple data such as integers are so overweight from their one-solution-fits-all nature that I just roll my own depending on what I need. It can turn slow code into fast code when you need it for relatively little work.

Also, if your objection to hash tables is that they destroy order then perhaps you may want to use them a little differently to obtain expected O(n) while maintaining order:

Create a hash table that maps your array elements to two bits as a counting field from zero to three, and thirty bits as an index into the array of elements. Unless you've got over a billion values in your array, thirty bits is enough. That way your hash values are just a single 32-bit word.

Go through the elements in the array. If an element isn't in the table, insert the value into the hash table and set the count field to zero. It doesn't matter what the index portion is when you store it. If the element is in the table and the count field is zero, bump it up to 1 and store the element index with the new count field value. If the count field is already one or greater, set it to two and don't touch the stored index -- leave it as it is.

Go through the elements in the array again. Look up each element and if its index is the one stored and the associated count field is more than zero, print it out.

This should yield you what you want in the proper order with O(n) time. But, it uses hash tables which aren't desired for an unknown reason. I highly recommend you either accept a solution such as this one or explain the limitations so that you'll get a more accurately targeted solution.


If you know the max value you can do like this,
have a separate array with the length as the max value

 int[max] secondarray;

    for(int i=o;i<arrayFirst.length;i++){
        if(secondarray[arrayFirst[i]]==0){
            secondarray[arrayFirst[i]]==arrayFirst[i];
         }else{
             result.add(arrayFirst[i]);
          }
     }


You can do this in O(n), this would however require the array to be integer. The space required for this can be though of the order size -2^32 to 2^32. What you'd need to do is find the max and min of the original array (arrayorig). Then make two arrays (arraynew+) and (arraynew-) .

The size of (arraynew+) will be max(arraorig)-min(arrayorig) if all values in arrayorig are +, else the size of (arraynew+) will be max(arrayorig).

The size (arraynew-) will be zero if all values are positive, else they will be equal to absolute value of min(arrayorig).

Then you can iterate over the arrayorig and increment the value by 1 of (arraynew-) or (arraynew+) at the index corresponding to the value of arraorig,if the value is positive increment should be done to (arraynew+) else if its negative increment should be done to (arraynew-) at the index of (arraynew-) which is equal to absolute value of arrayorig. Then all the indexes of (arraynew+) and ((arraynew-) with value >1 are the distinct values of arrayorig.


 void printRepeating(int arr[], int size)
 {
 int i;
   printf("The repeating elements are: \n");
 for (i = 0; i < size; i++)
 {
 if (arr[abs(arr[i])] >= 0)
  arr[abs(arr[i])] = -arr[abs(arr[i])];
 else
  printf(" %d ", abs(arr[i]));
 }
  }


Finding duplicates is just as hard as sorting. Your best bet is exploiting some property of your input in order to get a O(N) sort.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜