Determine if A+B=C exists in an array of n integers
This is a problem a friend of mine received as his homework (in algorithm and data structure class). He asked me about this. However, I can't solve it and have been thinking about it for some time during the last several days.
There are n random integers in the range [0, 231-1] (there may be duplicates. Determine if 3 numbers of these numbers satisfy A + B = C.
I first came up with a naive algorithm that's O(n2l开发者_StackOverflow社区og n). I then came up with an algorithm that's O(n2). Here is the pseudo code:
sort(a); // non-descending
for (i = 0; i < n; i++) {
j = i; k = i + 1;
while (j < n && k < n) {
if (a[i] + a[j] == a[k])
return true;
else if (a[i] + a[k] < a[j])
k++;
else
j++;
}
}
return false;
However, the problem states that 1 < n <= 106. I believe O(n2) is too slow. My solution doesn't make use of the randomness. However, I'm not sure if this is an important part of the problem.
The general problem is 3SUM-Hard and the question of whether there is a better than quadratic algorithm is open.
So if you need a faster algorithm, you would probably need to make use of the fact that they are 32-bit.
If numbers are random, any worst-case O(n^2)
algorithm (including yours) will run very fast. In fact, the practical complexity will be O(n*logn)
(the complexity of sorting).
It's much like quicksort, where we have O(n*logn)
average and a tiny chance of hitting O(n^2)
.
10^6
random numbers give us ~ 10^6*10^6
'nearly random' sums in range ~ 0..10^9
. What's the chance that one of those 10^12
random sums will be equal to a given random value in integer range? Pretty good.
Now, what's the chance that one of those 10^12
random sums will be equal to a one of 10^6 given random values? 100%, speaking poetically.
I've implemented your proposed solution, for n = 10^6
it performs on average 5000-10000
operations in the innermost loop. So much for O(n^2)
. Sorting is the costliest operation in there.
PS. You may reduce complexity farther and make it even O(1)
, if you update the solution to use hash instead of sorting.
PS 2. The test program in java, for the reference. Run it and see for yourself.
int n = 1000000;
int[] a = new int[n];
// generate random array
Random r = new Random();
for (int i = 0; i < n; ++i) {
do {
a[i] = r.nextInt();
} while (a[i] < 0);
}
Arrays.sort(a);
// number of operations inside main loop
int ops = 0;
// main logic, pretty much as OP described it
boolean found = false;
for (int i = 0; i < n && !found; ++i) {
int j = i;
int k = i + 1;
while (k < n) {
++ops;
if (a[i] > a[k] - a[j]) {
++k;
} else if (a[i] < a[k] - a[j]) {
++j;
} else {
System.out.println(a[i] + " + " + a[j] + " = " + a[k]);
found = true;
break;
}
}
}
System.out.println(ops);
Algorithm that uses hashing takes 10-900 microseconds in Python (average: 200 median: 60):
#!/usr/bin/env python
import random
L = frozenset(random.sample(xrange(2**31), 10**6))
print next(((a,b,a+b) for a in L for b in L if (a + b) in L), None)
It is O(N**2)
but it seems it is fast enough.
For comparison, the amortized O(N)
operation of creating the frozenset
takes 270
milliseconds (1000 times slower then the search) and to create random list it takes 0.9
seconds.
Note: random.sample
doesn't return repeated elements if an input sequence contains unique elements therefore frozenset
doesn't discard any elements in the above example. To solve the problem for a random sequence that allows repeated elements we should use two data structures:
#!/usr/bin/env python
import random
L = [random.randrange(2**31) for _ in xrange(10**6)]
S = frozenset(L)
print len(L), len(S)
print next(((a, b, a+b) for a in L for b in L if (a + b) in S), None)
Output
1000000 999762
(2055933464, 83277289, 2139210753)
I'm getting O(n log n) when measuring this over sorted lists:
from bisect import bisect_right
import cProfile as prof
import random
def find3sum(T):
if len(T) < 3:
return None
n = len(T)
top = T[-1]
for i in range(len(T)-1):
b = top - T[i]
if b < T[i]:
return None
k = bisect_right(T, b, i, n-1)
while k > i:
c = T[i] + T[k]
j = bisect_right(T, c, k, n-1)
if j <= k:
break
elif T[j] == c:
return (i, k, j)
else:
k -= 1
def test_one(a):
a = sorted(a)
r = find3sum(a)
i, k , j = r
assert a[i] + a[k] == a[j]
def test():
n = 100000
max = 200000
random.seed(0)
for _ in range(100):
a = [random.randint(0,max) for _x in xrange(n)]
test_one(a)
a = range(n)
test_one(a)
prof.run('test()')
These are the results (about one call to bisect per element):
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.002 0.002 183.764 183.764 <string>:1(<module>)
200 0.005 0.000 89.996 0.450 find2sum.py:25(test_one)
1 17.269 17.269 183.762 183.762 find2sum.py:31(test)
200 35.096 0.175 79.601 0.398 find2sum.py:5(find3sum)
10000000 44.958 0.000 52.398 0.000 random.py:160(randrange)
10000000 23.891 0.000 76.289 0.000 random.py:224(randint)
1 0.000 0.000 0.000 0.000 random.py:99(seed)
19599982 44.077 0.000 44.077 0.000 {_bisect.bisect_right}
1 0.000 0.000 0.000 0.000 {function seed at 0x9a1972c}
600 0.001 0.000 0.001 0.000 {len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
10000000 7.440 0.000 7.440 0.000 {method 'random' of '_random.Random' objects}
301 0.635 0.002 0.635 0.002 {range}
200 10.390 0.052 10.390 0.052 {sorted}
There are several optimizations that can considerably reduce the running time (like skipping over runs of numbers equal to the one already tested).
A+B=C, hence B=C-A or A=C-B
The above problem can be done in O(n) complexity by using hash table.
var C; // the sum you are looking for
for(each element)
X = C - element
boolean exists = lookup for X in hash table
if (exists) combination A+B=C exists in the given input
else hashtable.put(element)
Hope that helps.
精彩评论