开发者

Where can I use a technique from Majority Vote algorithm

As seen in the answers to Linear time majority algorithm?, it is possible to compute the majority of an array of elements in linear time and log(n) space.

It was shown that everyone who sees this algorithm believes that it is a cool technique. But does the idea generalize to new algorithms?

It seems the hidden power of this algorithm is in keeping a counter that plays a complex role -- such as "(count of majority ele开发者_运维技巧ment so far) - (count of second majority so far)". Are there other algorithms based on the same idea?


Umh, let's first start to understand why the algorithm works, in order to "isolate" the ideas there.

The point of the algorithm is that if you have a majority element, then you can match each occurrence of it with an "another" element, and then you have some more "spare".

So, we just have a counter which counts the number of "spare" occurrences of our guest answer. If it reaches 0, then it isn't a majority element for the subsequence starting from when we have "elected" the "current" element as the guest major element to the "current" position. Also, since our "guest" element matches every other element occurrence in the considered subsequence, there are no major elements in the considered subsequence.

Now, since:

  1. our algorithm gives a correct answer only if there is a major element, and
  2. if there is a major element, then it'll still be if we ignore the "current" subsequence when the counter goes to zero

it is obvious to see by contradiction that, if a major element exists, then we have a suffix of the whole sequence when the counter never gets to zero.

Now: what's the idea that can be exploited in new, O(1) size O(n) time algorithms?

To me, you can apply this technique whenever you have to compute a property P on a sequence of elements which:

  1. can be exteded from seq[n, m] to seq[n, m+1] in O(1) time if Q(seq[n, m+1]) doesn't hold
  2. P(seq[n, m]) can be computed in O(1) time and space from P(seq[n, j]) and P(seq[j, m]) if Q(seq[n, j]) holds

In our case, P is the "spare" occurrences of our "elected" major element and Q is "P is zero".

If you see things in that way, longest common subsequence exploits the same idea (dunno about its "coolness factor" ;))


Jaydev Misra and David Gries have a paper called Finding Repeated Elements (ACM page) which generalizes it to an element repeating more than n/k times (k=2 is the majority problem).

Of course, this is probably very similar to the original problem, and you are probably looking for 'different' algorithms.

Here is an example which is possibly different.

Give an algorithm which will detect if a string of parentheses ( '(' and ')') is well formed.

I believe the standard solution is to maintain a counter.

Side note:

As to answers which claim cannot be constant space etc, ask them for the model of computation. In the WORD RAM model for instance, you assume the integers/array indices etc are O(1).

A lot of folks incorrectly mix and match models. For instance, they will happily have the input array of n integers be O(n), have an array index be O(1) space, but a counter they consider Omega(log n) etc, which is nonsense. If they want to consider the size in bits, then the input itself is Omega(n log n) etc.


For people who want to understand what does this algorithm do and why does it works: look at my detailed answer.

Here I will describe a natural extension of this algorithm (or a generalization). So in a standard majority voting algorithm you have to find an element which appears at least n/2 times in the stream, where n is the size of the stream. You can do this in O(n) time (with a tiny constant and in O(log(n)) space, worse case and highly unlikely.


The generalized algorithm allows you to find k most frequent items, where each time appeared at least n/(k+1) times in the original stream. Note that if k=1, you end up with your original problem.

Solution to this problem is really similar to the original one, except instead of one counter and one possible element, you maintain k counters and k possible elements. Now the logic goes in a similar way. You iterate through the array and if the element is in the possible elements, you increase it's counter, if one of the counters is zero - substitute the element of this counter with new element. Otherwise just decrease the values.

As with original majority voting algorithm, you need to have a guarantee that you have these k majority elements, otherwise you have to do another pass over the array to verify that your previously found possible elements are correct. Here is my python attempt (have not done a thorough testing).

from collections import defaultdict
def majority_element_general(arr, k=1):
    counter, i = defaultdict(int), 0
    while len(counter) < k and i < len(arr):
        counter[arr[i]] += 1
        i += 1

    for i in arr[i:]:
        if i in counter:
            counter[i] += 1
        elif len(counter) < k:
            counter[i] = 1
        else:
            fields_to_remove = []
            for el in counter:
                if counter[el] > 1:
                    counter[el] -= 1
                else:
                    fields_to_remove.append(el)
            for el in fields_to_remove:
                del counter[el]

    potential_elements = counter.keys()
    # might want to check that they are really frequent.
    return potential_elements
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜