Dealing with M occurrences among N

2023-01-21 09:13 问答作者：

Question I开发者_如何学C've been given at the job interview. I was close to the solution but did not solve it unfortunately.

Assume we have a sequence that contains N numbers of type long. And we know for sure that among this sequence each number does occur exactly n times except for the one number that occurs exactly m times (0 < m < n). How do we find that number with O(N) operations and O(1) additional memory?

For the simplest case (when n = 2 and m = 1) all that we should do is just to perform accumulative xor on every number in sequence. The result will be equal to the desired number. But i'm stuck while trying to deal with arbitrary m and n.

I would appreciate an actual C++ solution.

EDIT: We know actual values of m and n a priori.

Example. We know that n = 3 and m = 2. The sequence (N = 8) is

5 11 5 2 11 5 2 11

And the right answer is 2 in this particular case because it occurs only twice.

You do 64 sums, one for each bit, for each of the sums you calculate sum mod n, this calculation return m for each bit that should to be set in the result, and 0 for each bit that should not be set.

Example:
n = 3, m = 2. list = [5 11 5 2 11 5 2 11]

              5  11   5   2  11   5   2  11
sum of bit 0: 1 + 1 + 1 + 0 + 1 + 1 + 0 + 1 = 6   6 % 3 = 0
sum of bit 1: 0 + 1 + 0 + 1 + 1 + 0 + 1 + 1 = 5   5 % 3 = 2
sum of bit 2: 1 + 0 + 1 + 0 + 0 + 1 + 0 + 0 = 3   3 % 3 = 0
sum of bit 3: 0 + 1 + 0 + 0 + 1 + 0 + 0 + 1 = 3   3 % 3 = 0

So only bit 1 is set, which means the result is 2.

Optimizing implementation:
(Tricks and considerations that are also useful for real problems)
It is worth noting that when iterating over an array, execution speed will to some extend be limited by memory access, if one need to perform multiple operations with each element it is usually fastest to perform them all on one element at a time, thus the processor only need to load each element from memory once. Interesting blog post on memory and cache.

It is possible to sum multiple bits in a single integer, rather than applying 64 different bitmasks to get each bit on it's own one could for instance use only 4 bitmasks that each extract 16 bits with 3 bits of space between each, as long as no overflow occur a normal addition operation will work exactly as if dealing with 16 4-bit integers, thus this method will work for 15 numbers. After 15 numbers have been processed this way the results must be added to a storage capable of holding bigger integers (could be 8 64-bit integers each holding 8 8-bit integers, they must of course in turn be emptied into bigger integers etc.).
The result is that rather than for each value doing 64 bitmasks, 63 bitshifts and 64 additions one need only do 4 bitmasks, 3 bitshifts and 4 additions, plus for every 15 values 8 bitmasks, 4 bitshifts and 8 additions, plus for every 255 values 16 bitmasks, 8 bitshifts and 16 additions etc.

Visualization:
(Summing 4x4-bit integers using 16-bit integers)

1000 1000 1000 1000 +
1000 0000 0000 1000 +
0000 0000 0000 1000 +
1000 1000 0000 0000 +
1000 0000 1000 0000 +
0000 0000 1000 1000 =
0010 0100 1100 0010

The result is the same whether you consider this to be 4 columns of 4-bit integers or 1 column of 16-bit integers, this is only true as long as long the 4-bit integers do not overflow.

edit) Okay, this method isn't as sound as I initially thought. eBusiness's solution is much simpler and works correctly for cases such as n=4, m=2.

We can generalise the xor method to work with arbitrary m and n. We first need to pick a base b such that gcd(n, b) = b, and gcd(m, b) < b. As odd n/even m pairs satisfy this for base 2, the standard binary xor works for these pairs.

Firstly we define (a^^n) to mean (a^a^...^a) for n a's, with generalised xor of base b. For example, with standard binary xor, a^^2 = 0.

We need to define our generalised xor. The properties we desire are basically the same as of addition (commutativity, associativity), and we need a^^b = 0. The obvious solution is (x^y) = (x+y)%b for each digit in the base b representation (convince yourself this works, and is the same as binary xor for base 2). Then, we simply apply this to all numbers in the sequence, and end up with result = s^^(m%b), where s is the special number.
Lastly, we need to revert our 'xor'ed base b number to the expected number. We can simply compute i^^(m%b) for i=0..b-1, and then look up which value we have in s for each digit in result.

This algorithm is be O(N). For each number in the list, we have a constant number of operations to do, because we have at most 64 digits. Reverting at the end is at worst O(N) for large b. We can do this last step in constant space by computing i^^(m%b) for all i for each digit (again, we have a constant number of digits).

Example:

n = 3, m = 2. list = [5 11 5 2 11 5 2 11]

First we choose a base b. Obviously we have to choose base 3.

The xor table for reference:

  0|1|2
0|0|1|2
1|1|2|0
2|2|0|1

The computation:

  5     11      5      2     11      5      2     11
0^0=0. 0^1=1. 1^0=1. 1^0=1. 1^1=2. 2^0=2. 2^0=2. 2^1=0.
0^1=1. 1^0=1. 1^1=2. 2^0=2. 2^0=2. 2^1=0. 0^0=0. 0^0=0.
0^2=2. 2^2=1. 1^2=0. 0^2=2. 2^2=1. 1^2=0. 0^2=2. 2^2=1.

m % b = 2.

Thus we have s^^2 = [001]. We generate a table of i^^2 for each digit i, and then do a reverse lookup.

   i | 0 | 1 | 2 |
i^^2 | 0 | 2 | 1 |

0 -> 0
0 -> 0
1 -> 2

We lastly convert our result back into binary/decimal. [002] = 2.

You simplest case can be more general, you can use the same technique you described for an odd number m and even number n.

Here's a solution that has the same running time as eBusiness's (which I consider to actually be O(N log N)), but truly uses O(1) words of memory. It assumes m is not a multiple of n. It also assumes two helper functions that count the number of elements strictly above and below their arguments.

int divider = 0;

for (int i = 63; i >= 0; i--) {
  divider |= 1 << i;
  int a = countAbove(divider);
  int b = countBelow(divider);
  if (a % n == 0 && b % n == 0) return divider;
  else if (a % n == 0) divider ^= 1 << i;
}

If you have a one to one hash on the set of integers from 0 to (N/n) + 1 then you can solve it by N iterations + N/n iterations with N memory useage. However there isn't a one to one mapping
If there is no constraint on memory it just must be constant you can define an array the size of the domain of longs that then you can solve the problem in 2N with constant gargantuan memory usage. For every x in N you simply Add to BIGARRY[x] then loop though BIGARRY looking for m. Its terrible and non implementable but meets the requirements and most interview questions are thought experiments any way.

If the list is sorted then this becomes quite simple in that you just need to examine each batch in turn to see if it is length m.

If the list is not sorted then I don't think it is possible with O(1) additional memory.

I believe that you can't do it using only O(1) additional space.

Here is my justification: you are given:

m
n
x_1 x_2 .. x_N

Since there are duplicates among the x_i values, let's define U as the set of unique values. All the elements in U appear n times, and one of them appears m times in the x_i series. Let us label the less often appearing element as u_0, and U_1 the set U - { u_0 }.

Let S be the the sum of all x_i. S can be written as:

sum(x_i) = n * sum(U_1) + m * u_0 = n * sum(U) + (m - n) * u_0

Solving this problem is equivalent to finding the sum of the unique elements in a series, and you can't do that in O(1) additional space, because you need an array or a hash table with linked elements - the space is actually O(N)

A soluation is some like the process to find the k-th order statistic

by dividing the sequence into 2 sub-seqs
(calculate the size of sub-seqs during the procedure)
while (sizeof(sub-seq) mod n != 0)
  do the same porcess on this sub-seq(dividing)

O(N) operations like finding the k-th order statistic.

继续阅读：algorithm arrays

Dealing with M occurrences among N

edit) Okay, this method isn't as sound as I initially thought. eBusiness's solution is much simpler and works correctly for cases such as n=4, m=2.

Example:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

edit) Okay, this method isn't as sound as I initially thought. eBusiness's solution is much simpler and works correctly for cases such as n=4, m=2.

Example:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？