Find the missing and duplicate elements in an array in linear time and constant space
You’re given an array of N 64 bit integers. N may be very large. You know that every integer 1..N appears once in the array, except there is one integer missing and one integer duplicated.
Write a linear time algorithm to find the missing and duplicated numbers. Further, your algorithm should run in small constant space and leave the array untouched.开发者_如何学运维
Source: http://maxschireson.com/2011/04/23/want-a-job-working-on-mongodb-your-first-online-interview-is-in-this-post/
If all numbers were present in the array the sum would be N(N+1)/2
.
Determine the actual sum by summing up all numbers in the array in O(n), let this be Sum(Actual)
.
One number is missing, let this be j
and one number is duplicated, let this be k
. That means that
Sum(Actual) = N(N+1)/2 + k - j
derived from that
k = Sum(Actual) -N(N+1)/2 + j
Also we can calculate the sum of squares in the array, which would sum up to n3/3 + n2/2 + n/6 if all numbers were present.
Now we can calculate the actual sum of squares in O(n), let this be Sum(Actual Squares)
.
Sum(Actual Squares) =n3/3 + n2/2 + n/6 + k2 - j2
Now we have two equations with which we can determine j
and k
.
The XOR trick works in two passes with a read-only array.
This avoids the problem of possible integer overflows which the sum and sum of squares solution has.
Let the two numbers be x
and y
, one of which is the missing number and the other repeated.
XOR all the elements of the array, along with 1,2,...,N
.
The result is w = x XOR y
.
Now since x
and y
are distinct, w
is non-zero.
Pick any non-zero bit of w
. x
and y
differ in this bit. Say the position of the bit is k
.
Now consider a split of the array (and the numbers 1,2,...,N
) into two sets, based on whether the bit at position k is 0 or 1.
Now if we compute the XOR (separately) of the elements of the two sets, the result has to be x
and y
.
Since the criteria for splitting is just checking if a bit is set of not, we can compute the two XORs of the two sets by making another pass through the array and having two variables, each of which holds the XOR of the elements seen so far (and 1,2,...N
), for each set. At the end, when we are done, those two variables will hold x
and y
.
Related:
Finding missing elements in an array which can be generalized to m appearing twice and m missing.
Find three numbers appeared only once which is about three missing.
Using the basic idea from a related interview question you could do:
- Sum up all the numbers (we shall call this
S1
) and their squares (S2
) - Compute the expected sum of the numbers, without modifications, i.e.
E1 = n*(n+1)/2
andE2 = n*(n+1)*(2n+1)/6
- Now you know that
E1 - S1 = d - m
andE2 - S2 = d^2 - m^2
, whered
is the duplicated number andm
the missing one. Solve this system of equations and you'll find that:
m = 1/2 ((E2 - S2)/(E1 - S1) - (E1 - S1)) d = 1/2 ((E2 - S2)/(E1 - S1) + (E1 - S1)) // or even simpler: d = m + (E1 - S1)
.
$S1 = $S2 = 0;
foreach ($nums as $num) {
$S1 += $num;
$S2 += $num * $num;
}
$D1 = $n * ($n + 1) / 2 - $S1;
$D2 = $n * ($n + 1) * (2 * $n + 1) / 6 - $S2;
$m = 1/2 * ($D2/$D1 - $D1);
$d = 1/2 * ($D2/$D1 + $D1);
Here is a Java implementation based on @Aryabhatta 's idea:
Input:[3 1 2 5 3]
Output:[3, 4]
public ArrayList<Integer> repeatedNumber(final List<Integer> A) {
ArrayList<Integer> ret = new ArrayList<>();
int xor = 0, x = 0, y = 0;
for(int i=0; i<A.size(); i++) {
xor ^= A.get(i);
}
for(int i=1; i<=A.size(); i++) {
xor ^= i;
}
int setBit = xor & ~(xor-1);
for(int i=0; i<A.size(); i++) {
if((A.get(i) & setBit) != 0) {
x ^= A.get(i);
} else {
y ^= A.get(i);
}
}
for(int i=1; i<=A.size(); i++) {
if((i & setBit) != 0) {
x ^= i;
} else {
y ^= i;
}
}
for(int i=0; i<A.size(); i++) {
if(A.get(i) == x) {
ret.add(x);
ret.add(y);
return ret;
}
if(A.get(i) == y) {
ret.add(y);
ret.add(x);
return ret;
}
}
return ret;
}
The solution proposed by BrokenGlass covers the case for two unknowns (corresponding to one duplicate number and one missing number), using two formulas:
and
The formulas yield the generalized harmonic number of orders n of -1 and -2, respectively. (Power series)
This solution is generalizable for 3 unknowns by including the value of generalized harmonic number of order n of -3.
To solve for m unknowns (duplicates and missing numbers), use m generalized harmonic numbers of orders n of -1 through -m.
Moron notes that this approach was discussed earlier in StackOverflow at Easy interview question got harder.
Taking the leave the array untouched
requirement literally (i.e. the array can be temporarily modified as long as it does not change in the end), a programming-oriented solution can be suggested.
I assume that array size N is much smaller than 2^64 which is an utterly unrealistic amount of memory. So we can safely assume that N < 2^P
such that P << 64
(significantly smaller). In other words this means that all numbers in the array have some high bits unused. So let's just use the highest bit as a flag whether the index of that position has been seen in the array. The algorithm goes as follows:
set HIGH = 2^63 // a number with only the highest bit set
scan the array, for each number k do
if array[k] < HIGH: array[k] = array[k] + HIGH // set the highest bit
else: k is the duplicate
for each i in 1..N do
if array[i] < HIGH: i is missing
else: array[i] = array[i] - HIGH // restore the original number
This is linear time and very little computation
long long int len = A.size();
long long int sumOfN = (len * (len+1) ) /2, sumOfNsq = (len * (len +1) *(2*len +1) )/6;
long long int missingNumber1=0, missingNumber2=0;
for(int i=0;i<A.size(); i++){
sumOfN -= (long long int)A[i];
sumOfNsq -= (long long int)A[i]*(long long int)A[i];
}
missingno = (sumOfN + sumOfNsq/sumOfN)/2;
reaptingNO = missingNumber1 - sumOfN;
Psuedo code assuming the set is sorted
missing = nil
duplicate = nil
for i = 0, i < set.size - 1, i += 1
if set[i] == set[i + 1]
duplicate = set[i]
else if((set[i] + 1) != set[i+1])
missing = set[i] + 1
if missing != nil && duplicate != nil
break
return (missing, duplicate)
精彩评论