Finding duplicates in O(n) time and O(1) space

2023-02-28 06:41 问答作者：

Input: Given an array of n elements which contains elements from 0 to n-1, with any of these numbers appearing any number of times.

Goal : To 开发者_JAVA百科find these repeating numbers in O(n) and using only constant memory space.

For example, let n be 7 and array be {1, 2, 3, 1, 3, 0, 6}, the answer should be 1 & 3. I checked similar questions here but the answers used some data structures like HashSet etc.

Any efficient algorithm for the same?

This is what I came up with, which doesn't require the additional sign bit:

for i := 0 to n - 1
    while A[A[i]] != A[i] 
        swap(A[i], A[A[i]])
    end while
end for

for i := 0 to n - 1
    if A[i] != i then 
        print A[i]
    end if
end for

The first loop permutes the array so that if element x is present at least once, then one of those entries will be at position A[x].

Note that it may not look O(n) at first blush, but it is - although it has a nested loop, it still runs in O(N) time. A swap only occurs if there is an i such that A[i] != i, and each swap sets at least one element such that A[i] == i, where that wasn't true before. This means that the total number of swaps (and thus the total number of executions of the while loop body) is at most N-1.

The second loop prints the values of x for which A[x] doesn't equal x - since the first loop guarantees that if x exists at least once in the array, one of those instances will be at A[x], this means that it prints those values of x which are not present in the array.

(Ideone link so you can play with it)

caf's brilliant answer prints each number that appears k times in the array k-1 times. That's useful behaviour, but the question arguably calls for each duplicate to be printed once only, and he alludes to the possibility of doing this without blowing the linear time/constant space bounds. This can be done by replacing his second loop with the following pseudocode:

for (i = 0; i < N; ++i) {
    if (A[i] != i && A[A[i]] == A[i]) {
        print A[i];
        A[A[i]] = i;
    }
}

This exploits the property that after the first loop runs, if any value m appears more than once, then one of those appearances is guaranteed to be in the correct position, namely A[m]. If we are careful we can use that "home" location to store information about whether any duplicates have been printed yet or not.

In caf's version, as we went through the array, A[i] != i implied that A[i] is a duplicate. In my version, I rely on a slightly different invariant: that A[i] != i && A[A[i]] == A[i] implies that A[i] is a duplicate that we haven't seen before. (If you drop the "that we haven't seen before" part, the rest can be seen to be implied by the truth of caf's invariant, and the guarantee that all duplicates have some copy in a home location.) This property holds at the outset (after caf's 1st loop finishes) and I show below that it's maintained after each step.

As we go through the array, success on the A[i] != i part of the test implies that A[i] could be a duplicate that hasn't been seen before. If we haven't seen it before, then we expect A[i]'s home location to point to itself -- that's what's tested for by the second half of the if condition. If that's the case, we print it and alter the home location to point back to this first found duplicate, creating a 2-step "cycle".

To see that this operation doesn't alter our invariant, suppose m = A[i] for a particular position i satisfying A[i] != i && A[A[i]] == A[i]. It's obvious that the change we make (A[A[i]] = i) will work to prevent other non-home occurrences of m from being output as duplicates by causing the 2nd half of their if conditions to fail, but will it work when i arrives at the home location, m? Yes it will, because now, even though at this new i we find that the 1st half of the if condition, A[i] != i, is true, the 2nd half tests whether the location it points to is a home location and finds that it isn't. In this situation we no longer know whether m or A[m] was the duplicate value, but we know that either way, it has already been reported, because these 2-cycles are guaranteed not to appear in the result of caf's 1st loop. (Note that if m != A[m] then exactly one of m and A[m] occurs more than once, and the other does not occur at all.)

Here is the pseudocode

for i <- 0 to n-1:
   if (A[abs(A[i])]) >= 0 :
       (A[abs(A[i])]) = -(A[abs(A[i])])
   else
      print i
end for

Sample code in C++

For relatively small N we can use div/mod operations

n.times do |i|
  e = a[i]%n
  a[e] += n
end

n.times do |i| 
  count = a[i]/n
  puts i if count > 1
end

Not C/C++ but anyway

http://ideone.com/GRZPI

Not really pretty but at least it's easy to see the O(N) and O(1) properties. Basically we scan the array and, for each number we see if the corresponding position has been flagged already-seen-once (N) or already-seen-multiple-times (N+1). If it is flagged already-seen-once, we print it and flag it already-seen-multiple-times. If it is not flagged, we flag it already-seen-once and we move the original value of the corresponding index to the current position (flagging is a destructive operation).

for (i=0; i<a.length; i++) {
  value = a[i];
  if (value >= N)
    continue;
  if (a[value] == N)  {
    a[value] = N+1; 
    print value;
  } else if (a[value] < N) {
    if (value > i)
      a[i--] = a[value];
    a[value] = N;
  }
}

or, better yet (faster, despite the double loop):

for (i=0; i<a.length; i++) {
  value = a[i];
  while (value < N) {
    if (a[value] == N)  {
      a[value] = N+1; 
      print value;
      value = N;
    } else if (a[value] < N) {
      newvalue = value > i ? a[value] : N;
      a[value] = N;
      value = newvalue;
    }
  }
}

Let's assume that we present this array as a uni-directional graph data structure - each number is a vertex and its index in the array points to another vertex forming an edge of the graph.

For even more simplicity we have indices 0 to n-1 and range of number from 0..n-1. e.g.

   0  1  2  3  4 
 a[3, 2, 4, 3, 1]

0(3) --> 3(3) is a cycle.

Answer: Just traverse the array relying on indices. if a[x] = a[y] then it's a cycle and thus duplicate. Skip to the next index and continue again and so forth until the end of of an array. Complexity: O(n) time and O(1) space.

Check out the explanation here https://youtu.be/qJ_Y7pKP0e4

code here https://github.com/TechieExpress/DataStructures/blob/main/findDuplicates

Code snippet:

/**
*
* @author techieExpress
*
* You are given a list of n-1 integers and these integers are in the range * of 1 to n.
* Input: Given an array of n elements which contains elements 
* from 0 to n-1, with any of these numbers appearing any number of times.
* 
* Goal: To find these repeating numbers in O(n) and using only constant  * * memory space.
**/

public class findDuplicates {
    
    
    public static void main(String args[])
    {
        int arr[] = { 2,1,1,2 };
  
        for (int i = 0; i < arr.length; i++) {
            arr[arr[i] % arr.length]
                = arr[arr[i] % arr.length]
                  + arr.length;
        }
        System.out.println("The repeating elements are : ");
        for (int i = 0; i < arr.length; i++) {
            
            //System.out.print(numRay[i]);
            if (arr[i] >= arr.length * 2) {
                System.out.println(i + " ");
                arr[i]=arr[i]%arr.length;
            }
        }
    }

}

One solution in C is:

#include <stdio.h>

int finddup(int *arr,int len)
{
    int i;
    printf("Duplicate Elements ::");
    for(i = 0; i < len; i++)
    {
        if(arr[abs(arr[i])] > 0)
          arr[abs(arr[i])] = -arr[abs(arr[i])];
        else if(arr[abs(arr[i])] == 0)
        {
             arr[abs(arr[i])] = - len ;
        }
        else
          printf("%d ", abs(arr[i]));
    }

}
int main()
{   
    int arr1[]={0,1,1,2,2,0,2,0,0,5};
    finddup(arr1,sizeof(arr1)/sizeof(arr1[0]));
    return 0;
}

It is O(n) time and O(1) space complexity.

We can do it O(n) time and O(1) space complexity by -

take the ith array element.
Make it +ve if it's negative
Last, multiply with -1 to the number getting from array index (ith element).

If the number positive, return the index.

 def findDuplicate(self, arr: List[int]) -> int:
     n=len(arr)
     for i in range(0,n):

         arr[(abs(arr[i]))-1]=arr[(abs(arr[i]))-1]*(-1)
         if arr[(abs(arr[i]))-1]>0:
             return abs(arr[i])

Algorithm can be readily seen in the following C function. Retrieving original array, although not required, will be possible taking each entry modulo n.

void print_repeats(unsigned a[], unsigned n)
{
    unsigned i, _2n = 2*n;
    for(i = 0; i < n; ++i) if(a[a[i] % n] < _2n) a[a[i] % n] += n;
    for(i = 0; i < n; ++i) if(a[i] >= _2n) printf("%u ", i);
    putchar('\n');
}

Ideone Link for testing.

static void findrepeat()
{
    int[] arr = new int[7] {0,2,1,0,0,4,4};

    for (int i = 0; i < arr.Length; i++)
    {
        if (i != arr[i])
        {
            if (arr[i] == arr[arr[i]])
            {
                Console.WriteLine(arr[i] + "!!!");
            }

            int t = arr[i];
            arr[i] = arr[arr[i]];
            arr[t] = t;
        }
    }

    for (int j = 0; j < arr.Length; j++)
    {
        Console.Write(arr[j] + " ");
    }
    Console.WriteLine();

    for (int j = 0; j < arr.Length; j++)
    {
        if (j == arr[j])
        {
            arr[j] = 1;
        }
        else
        {
            arr[arr[j]]++;
            arr[j] = 0;
        }
    }

    for (int j = 0; j < arr.Length; j++)
    {
        Console.Write(arr[j] + " ");
    }
    Console.WriteLine();
}

private static void printRepeating(int arr[], int size) {
        int i = 0;
        int j = 1;
        while (i < (size - 1)) {
            if (arr[i] == arr[j]) {
                System.out.println(arr[i] + " repeated at index " + j);
                j = size;
            }
            j++;
            if (j >= (size - 1)) {
                i++;
                j = i + 1;
            }
        }

    }

If the array is not too large this solution is simpler, It creates another array of same size for ticking.

1 Create a bitmap/array of same size as your input array

 int check_list[SIZE_OF_INPUT];
 for(n elements in checklist)
     check_list[i]=0;    //initialize to zero

2 scan your input array and increment its count in the above array

for(i=0;i<n;i++) // every element in input array
{
  check_list[a[i]]++; //increment its count  
}

3 Now scan the check_list array and print the duplicate either once or as many times they have been duplicated

for(i=0;i<n;i++)
{

    if(check_list[i]>1) // appeared as duplicate
    {
        printf(" ",i);  
    }
}

Of course it takes twice the space consumed by solution given above ,but time efficiency is O(2n) which is basically O(n).

继续阅读：algorithm c

Finding duplicates in O(n) time and O(1) space

Ideone Link for testing.

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Ideone Link for testing.

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？