Finding the repeated element
In 开发者_运维百科an array with integers between 1 and 1,000,000 or say some very larger value ,if a single value is occurring twice twice. How do you determine which one?
I think we can use a bitmap to mark the elements , and then traverse allover again to find out the repeated element . But , i think it is a process with high complexity.Is there any better way ?
This sounds like homework or an interview question ... so rather than giving away the answer, here's a hint.
What calculations can you do on a range of integers whose answer you can determine ahead of time?
Once you realize the answer to this, you should be able to figure it out .... if you still can't figure it out ... (and it's not homework) I'll post the solution :)
EDIT: Ok. So here's the elegant solution ... if the list contains ALL of the integers within the range.
We know that all of the values between 1 and N must exist in the list. Using Guass' formula we can quickly compute the expected value of a range of integers:
Sum(1..N) = 1/2 * (1 + N) * Count(1..N).
Since we know the expected sum, all we have to do is loop through all the values and sum their values. The different between this sum and the expected sum is the duplicate value.
EDIT: As other's have commented, the question doesn't state that the range contains all of the integers ... in this case, you have to decide whether you want to optimize for memory or time.
If you want to perform the operation using O(1) storage, you can perform an in-place sort of the list. As you're sorting you have to check adjacent elements. Once you see a duplicate, you know you can stop. Optimal sorting is an O(n log n) operation on average - which establishes an upper bound for find the duplicate in this manner.
If you want to optimize for speed, you can use an additional O(n) storage. Using a HashSet (or similar structure), insert values from your list until you determine you are inserting a duplicate into the HashSet. Inserting n items into a HashSet is an O(n) operation on average, which establishes that as an upper bound for this method.
you may try to use bits as hashmap:
1 at position k means that number k occured before
0 at position k means that number k did not occured before
pseudocode:
0. assume that your array is A
1. initialize bitarray(there is nice class in c# for this) of 1000000 length filled with zeros
2. for each num in A:
if bitarray[num]
return num
else
bitarray[num] = 1
end
The time complexity of the bitmap solution is O(n) and it doesn't seem like you could do better than that. However it will take up a lot of memory for a generic list of numbers. Sorting the numbers is an obvious way to detect duplicates and doesn't require extra space if you don't mind the current order changing.
Assuming the array is of length n < N (i.e. not ALL integers are present -- in this case LBushkin's trick is the answer to this homework problem), there is no way to solve this problem using less than O(n) memory using an algorithm that just takes a single pass through the array. This is by reduction to the set disjointness problem.
Suppose I made the problem easier, and I promised you that the duplicate elements were in the array such that the first one was in the first n/2 elements, and the second one was in the last n/2 elements. Now we can think of playing a game in which two people each hold a string of n/2 elements, and want to know how many messages they have to send to be sure that none of their elements are the same. Since the first player could simulate the run of any algorithm that takes a pass through the array, and send the contents of its memory to the second player, a lower bound on the number of messages they need to send implies a lower bound on the memory requirements of any algorithm.
But its easy to see in this simple game that they need to send n/2 messages to be sure that they don't hold any of the same elements, which yields the lower bound.
Edit: This generalizes to show that for algorithms that make k passes through the array and use memory m, that m*k = Omega(n). And it is easy to see that you can in fact trade off memory for time in this way.
Of course, if you are willing to use algorithms that don't simply take passes through the array, you can do better as suggested already: sort the array, then take 1 pass through. This takes time O(nlogn) and space O(1). But note curiously that this proves that any sorting algorithm that just makes passes through the array must take time Omega(n^2)! Sorting algorithms that break the n^2 bound must make random accesses.
精彩评论