How to identify the duplicated number, in an array, with minimum compexity?
There is an array of size 10,000. It store the number 1 to 10,000 in randomly order.
Each number occurs one time only.No开发者_StackOverflow中文版w if any number is removed from that array and any other number is duplicated into array.
How can we identify the which number is duplicated, with minimum complexity?
NOTE : We can not use another array.
The fastest way is an O(N) in-place pigeonhole sort.
Start at the first location of the array, a[0]
. Say it has the value 5
. You know that 5
belongs at a[4]
, so swap locations 0
and 4
. Now a new value is in a[0]
. Swap it to where it needs to go.
Repeat until a[0] == 1
, then move on to a[1]
and swap until a[1] == 2
, etc.
If at any point you end up attempting to swap two identical values, then you have found the duplicate!
Runtime: O(N) with a very low coefficient and early exit. Storage required: zero.
Bonus optimization: count how many swaps have occurred and exit early if n_swaps == array_size
. This resulted in a 15% improvement when I implemented a similar algorithm for permuting a sequence.
Compute the sum and the sum of the squares of the elements (you will need 64 bit values for the sum of the squares). From these you can recover which element was modified:
Subtract the expected values for the unmodified array. If x was removed and y duplicated you get the difference y - x for the sum and y2 - x2 = (y + x) (y - x) for the sum of squares. From that it is easy to recover x and y.
Edit: Note that this may be faster than pigeonhole sort, because it runs linearly over the array and is thus more cache friendly.
Why not simply using a second array or other data structure like hash table (hash table if you like, depending on the memory/performance tradeoff). This second array would simply store the count of a number in the original array. Now just add a +/- to the access function of the original array and you have your information immediately.
ps when you wrote "we can not use another array" - I assume, you can not change the ORIGINAL data structure. However the use of additional data structures is possible....
Sort the array, then iterate through until you hit two of the same number in a row.
精彩评论