how to calculate the mode of an unsorted array of integers in O(N)?

2023-01-17 09:43 问答作者：

...using an iterative procedure (no hash table)?

It's not homework. And by mod开发者_开发问答e I mean the most frequent number (statistical mode). I don't want to use a hash table because I want to know how it can be done iteratively.

OK Fantius, how bout this?

Sort the list with a RadixSort (BucketSort) algorithm (technically O(N) time; the numbers must be integers). Start at the first element, remember its value and start a count at 1. Iterate through the list, incrementing the count, until you reach a different value. If the count for that value is higher than the current high count, remember that value and count as the mode. If you get a tie with the high count, remember both (or all) numbers.

... yeah, yeah, the RadixSort is not an in-place sort, and thus involves something you could call a hashtable (a collection of collections indexed by the current digit). However, the hashtable is used to sort, not to calculate the mode.

I'm going to say that on an unsorted list, it would be impossible to compute the mode in linear time without involving a hashtable SOMEWHERE. On a sorted list, the second half of this algorithm works by just keeping track of the current max count.

Definitely sounds like homework. But, try this: go through the list once, and find the largest number. Create an array of integers with that many elements, all initialized to zero. Then, go through the list again, and for each number, increment the equivalent index of the array by 1. Finally, scan your array and return the index that has the highest value. This will execute in roughly linear time, whereas any algorithm that includes a sort will probably take NlogN time or worse. However, this solution is a memory hog; it'll basically create a bell plot just to give you one number from it.

Remember that many (but not all) languages use arrays that are zero-based, so when converting from a "natural" number to an index, subtract one, and then add one to go from index to natural number.

If you don't want to use a hash, use a modified binary search trie (with a counter per node). For each element in the array insert into the trie. If it already exists in the trie, increment the counter. At the end, find the node with the highest counter.

Of course you can also use a hashmap that maps to a counter variable and will work the same way. I don't understand your complaint about it not being iterative... You iterate through the array, and then you iterate through the members of the hashmap to find the highest counter.

just use counting sort and look into array which store the number occurrences for each entity.h store the number occurrences for each entity.

I prepared two implementations in Python with different space and time complexity:

The first one uses "occurence array" is O(k) in terms of time complexity and S(k+1) in terms of space needed, where k is the greatest number in input.

input =[1,2,3,8,4,6,1,3,7,9,6,1,9]

def find_max(tab):
    max=tab[0]
    for i in range(0,len(tab)):
        if tab[i] > max:
            max=tab[i]
    return max

C = [0]*(find_max(input)+1)
print len(C)
def count_occurences(tab):
    max_occurence=C[0]
    max_occurence_index=0
    for i in range(0,len(tab)):
        C[tab[i]]=C[tab[i]]+1
        if C[tab[i]]>max_occurence:
            max_occurence = C[tab[i]]
            max_occurence_index=tab[i]
    return max_occurence_index

print count_occurences(input)

NOTE: Imagine such pitiful example of input like an array [1, 10^8,1,1,1], there will be array of length k+1=100000001 needed.

The second one solution assumes, that we sort our input before searching for mode. I used radix sort, which has time complexity O(kn) where k is the length of the longest number and n is size of the input array. And then we have to iterate over whole sorted array of size n, to determine the longest subset of numbers standing for mode.

input =[1,2,3,8,4,6,1,3,7,9,6,1,9]

def radix_sort(A):
    len_A = len(A)
    mod = 5 #init num of buckets
    div = 1
    while True:
        the_buckets =  [[], [], [], [], [], [], [], [], [], []]
        for value in A:
            ldigit = value % mod
            ldigit = ldigit / div
            the_buckets[ldigit].append(value)
        mod = mod * 10
        div = div * 10
        if len(the_buckets[0]) == len_A:
            return the_buckets[0]
        A = []
        rd_list_append = A.append
        for b in the_buckets:
            for i in b:
                rd_list_append(i)     

def find_mode_in_sorted(A):
    mode=A[0]
    number_of_occurences =1
    number_of_occurences_canidate=0
    for i in range(1,len(A)):
        if A[i] == mode:
            number_of_occurences =number_of_occurences +1
        else:
            number_of_occurences_canidate=number_of_occurences_canidate+1
        if A[i] != A[i-1]:
            number_of_occurences_canidate=0
        if number_of_occurences_canidate > number_of_occurences :
            mode=A[i]
            number_of_occurences =number_of_occurences_canidate+1
    return mode#,number_of_occurences 

s_input=radix_sort(input)
print find_mode_in_sorted(s_input)

Using JavaScript:

const mode = (arr) => {
    let numMapping = {};
    let mode
    let greatestFreq = 0;
    for(var i = 0; i < arr.length; i++){
        if(numMapping[arr[i]] === undefined){
            numMapping[arr[i]] = 0;
        }
        numMapping[arr[i]] += 1;
        if (numMapping[arr[i]] > greatestFreq){
          greatestFreq = numMapping[arr[i]]
          mode = arr[i]
        }
    }
    return parseInt(mode)
}

继续阅读：arrays mode

how to calculate the mode of an unsorted array of integers in O(N)?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？