开发者

how to calculate the mode of an unsorted array of integers in O(N)?

...using an iterative procedure (no hash table)?

It's not homework. And by mod开发者_开发问答e I mean the most frequent number (statistical mode). I don't want to use a hash table because I want to know how it can be done iteratively.


OK Fantius, how bout this?

Sort the list with a RadixSort (BucketSort) algorithm (technically O(N) time; the numbers must be integers). Start at the first element, remember its value and start a count at 1. Iterate through the list, incrementing the count, until you reach a different value. If the count for that value is higher than the current high count, remember that value and count as the mode. If you get a tie with the high count, remember both (or all) numbers.

... yeah, yeah, the RadixSort is not an in-place sort, and thus involves something you could call a hashtable (a collection of collections indexed by the current digit). However, the hashtable is used to sort, not to calculate the mode.

I'm going to say that on an unsorted list, it would be impossible to compute the mode in linear time without involving a hashtable SOMEWHERE. On a sorted list, the second half of this algorithm works by just keeping track of the current max count.


Definitely sounds like homework. But, try this: go through the list once, and find the largest number. Create an array of integers with that many elements, all initialized to zero. Then, go through the list again, and for each number, increment the equivalent index of the array by 1. Finally, scan your array and return the index that has the highest value. This will execute in roughly linear time, whereas any algorithm that includes a sort will probably take NlogN time or worse. However, this solution is a memory hog; it'll basically create a bell plot just to give you one number from it.

Remember that many (but not all) languages use arrays that are zero-based, so when converting from a "natural" number to an index, subtract one, and then add one to go from index to natural number.


If you don't want to use a hash, use a modified binary search trie (with a counter per node). For each element in the array insert into the trie. If it already exists in the trie, increment the counter. At the end, find the node with the highest counter.

Of course you can also use a hashmap that maps to a counter variable and will work the same way. I don't understand your complaint about it not being iterative... You iterate through the array, and then you iterate through the members of the hashmap to find the highest counter.


just use counting sort and look into array which store the number occurrences for each entity.h store the number occurrences for each entity.


I prepared two implementations in Python with different space and time complexity:

The first one uses "occurence array" is O(k) in terms of time complexity and S(k+1) in terms of space needed, where k is the greatest number in input.

input =[1,2,3,8,4,6,1,3,7,9,6,1,9]

def find_max(tab):
    max=tab[0]
    for i in range(0,len(tab)):
        if tab[i] > max:
            max=tab[i]
    return max

C = [0]*(find_max(input)+1)
print len(C)
def count_occurences(tab):
    max_occurence=C[0]
    max_occurence_index=0
    for i in range(0,len(tab)):
        C[tab[i]]=C[tab[i]]+1
        if C[tab[i]]>max_occurence:
            max_occurence = C[tab[i]]
            max_occurence_index=tab[i]
    return max_occurence_index

print count_occurences(input)

NOTE: Imagine such pitiful example of input like an array [1, 10^8,1,1,1], there will be array of length k+1=100000001 needed.

The second one solution assumes, that we sort our input before searching for mode. I used radix sort, which has time complexity O(kn) where k is the length of the longest number and n is size of the input array. And then we have to iterate over whole sorted array of size n, to determine the longest subset of numbers standing for mode.

input =[1,2,3,8,4,6,1,3,7,9,6,1,9]

def radix_sort(A):
    len_A = len(A)
    mod = 5 #init num of buckets
    div = 1
    while True:
        the_buckets =  [[], [], [], [], [], [], [], [], [], []]
        for value in A:
            ldigit = value % mod
            ldigit = ldigit / div
            the_buckets[ldigit].append(value)
        mod = mod * 10
        div = div * 10
        if len(the_buckets[0]) == len_A:
            return the_buckets[0]
        A = []
        rd_list_append = A.append
        for b in the_buckets:
            for i in b:
                rd_list_append(i)     

def find_mode_in_sorted(A):
    mode=A[0]
    number_of_occurences =1
    number_of_occurences_canidate=0
    for i in range(1,len(A)):
        if A[i] == mode:
            number_of_occurences =number_of_occurences +1
        else:
            number_of_occurences_canidate=number_of_occurences_canidate+1
        if A[i] != A[i-1]:
            number_of_occurences_canidate=0
        if number_of_occurences_canidate > number_of_occurences :
            mode=A[i]
            number_of_occurences =number_of_occurences_canidate+1
    return mode#,number_of_occurences 

s_input=radix_sort(input)
print find_mode_in_sorted(s_input)


Using JavaScript:

const mode = (arr) => {
    let numMapping = {};
    let mode
    let greatestFreq = 0;
    for(var i = 0; i < arr.length; i++){
        if(numMapping[arr[i]] === undefined){
            numMapping[arr[i]] = 0;
        }
        numMapping[arr[i]] += 1;
        if (numMapping[arr[i]] > greatestFreq){
          greatestFreq = numMapping[arr[i]]
          mode = arr[i]
        }
    }
    return parseInt(mode)
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜