Finding the repeated element

2023-01-18 16:15 问答作者：

In 开发者_运维百科an array with integers between 1 and 1,000,000 or say some very larger value ,if a single value is occurring twice twice. How do you determine which one?

I think we can use a bitmap to mark the elements , and then traverse allover again to find out the repeated element . But , i think it is a process with high complexity.Is there any better way ?

This sounds like homework or an interview question ... so rather than giving away the answer, here's a hint.

What calculations can you do on a range of integers whose answer you can determine ahead of time?

Once you realize the answer to this, you should be able to figure it out .... if you still can't figure it out ... (and it's not homework) I'll post the solution :)

EDIT: Ok. So here's the elegant solution ... if the list contains ALL of the integers within the range.

We know that all of the values between 1 and N must exist in the list. Using Guass' formula we can quickly compute the expected value of a range of integers:

Sum(1..N) = 1/2 * (1 + N) * Count(1..N).

Since we know the expected sum, all we have to do is loop through all the values and sum their values. The different between this sum and the expected sum is the duplicate value.

EDIT: As other's have commented, the question doesn't state that the range contains all of the integers ... in this case, you have to decide whether you want to optimize for memory or time.

If you want to perform the operation using O(1) storage, you can perform an in-place sort of the list. As you're sorting you have to check adjacent elements. Once you see a duplicate, you know you can stop. Optimal sorting is an O(n log n) operation on average - which establishes an upper bound for find the duplicate in this manner.

If you want to optimize for speed, you can use an additional O(n) storage. Using a HashSet (or similar structure), insert values from your list until you determine you are inserting a duplicate into the HashSet. Inserting n items into a HashSet is an O(n) operation on average, which establishes that as an upper bound for this method.

you may try to use bits as hashmap:

1 at position k means that number k occured before

0 at position k means that number k did not occured before

pseudocode:

0. assume that your array is A
1. initialize bitarray(there is nice class in c# for this) of 1000000 length filled with zeros
2. for each num in A:
   if bitarray[num] 
      return num
   else
      bitarray[num] = 1
   end

The time complexity of the bitmap solution is O(n) and it doesn't seem like you could do better than that. However it will take up a lot of memory for a generic list of numbers. Sorting the numbers is an obvious way to detect duplicates and doesn't require extra space if you don't mind the current order changing.

Assuming the array is of length n < N (i.e. not ALL integers are present -- in this case LBushkin's trick is the answer to this homework problem), there is no way to solve this problem using less than O(n) memory using an algorithm that just takes a single pass through the array. This is by reduction to the set disjointness problem.

Suppose I made the problem easier, and I promised you that the duplicate elements were in the array such that the first one was in the first n/2 elements, and the second one was in the last n/2 elements. Now we can think of playing a game in which two people each hold a string of n/2 elements, and want to know how many messages they have to send to be sure that none of their elements are the same. Since the first player could simulate the run of any algorithm that takes a pass through the array, and send the contents of its memory to the second player, a lower bound on the number of messages they need to send implies a lower bound on the memory requirements of any algorithm.

But its easy to see in this simple game that they need to send n/2 messages to be sure that they don't hold any of the same elements, which yields the lower bound.

Edit: This generalizes to show that for algorithms that make k passes through the array and use memory m, that m*k = Omega(n). And it is easy to see that you can in fact trade off memory for time in this way.

Of course, if you are willing to use algorithms that don't simply take passes through the array, you can do better as suggested already: sort the array, then take 1 pass through. This takes time O(nlogn) and space O(1). But note curiously that this proves that any sorting algorithm that just makes passes through the array must take time Omega(n^2)! Sorting algorithms that break the n^2 bound must make random accesses.

继续阅读：algorithm

Finding the repeated element

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？