How to get the most represented object from an array

2022-12-18 15:59 问答作者：

I have an array with some objects, and there are several objects that are alike. E.g: fruit = [apple, orange, apple, banana, ba开发者_运维问答nana, orange, apple, apple]

What is the most efficient way to get the most represented object from this array? In this case it would be "apple" but how would you go out and calculate that in an efficient way?

Don't reinvent the wheel. In Python 2.7+ you can use the Counter class:

import collections
fruit=['apple', 'orange', 'apple', 'banana', 'banana', 'orange', 'apple', 'apple']
c=collections.Counter(fruit)
print(c.most_common(1))
# [('apple', 4)]

If you are using an older version of Python, then you can download Counter here.

While it's good to know how to implement something like this yourself, it's also a good idea to get used to using Counter, since it is (or going to be) part of the standard library.

If the objects are hashable then you can use a dict to store the counts:

results = {}
for item in somelist:
  if item not in results:
    results[item] = 1
  else
    results[item] += 1

print max(results.iteritems(), key=operator.itemgetter(1))

Keep a dictionary of how often each object appears.

Walk through the list once, building this table. As you go, keep track of which object has appeared the most often so far.

This code is untested.

from collections import defaultdict

def mode(objects):
    h = defaultdict(int)
    max_f = 0
    max_obj = None
    for o in objects:
        f = h[o] = h[o] + 1
        if f > max_f:
            max_f = f
            max_obj = o
    return max_obj

If the objects are not hashable, you can instead hash some unique feature of them, such as id(o).

You want an efficient method. Clearly it's possible in O(n) time, so any method that requires sorting the list is out as that would be O(n log(n)). It's not possible to do it faster than O(n) because even if you check the first n/2-1 elements, and they are all "apple", you don't know that the rest of the elements won't be bananas.

So given that we're looking for O(n), you must iterate over the list and keep a count of how many items of each type you've seen.

A defaultdict would be a simple way to implement this in practice.

>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for i in ['apple', 'banana', 'apple']:
...    d[i] += 1
...
>>> d
defaultdict(<type 'int'>, {'apple': 2, 'banana': 1})

The best time you can hope to achieve here is O(n) - you'll always need to walk the entire array at least once. The easiest way will certainly be to build a histogram. If your dictionary structure (map of some kind) offers O(1) insert and retrieve then this is as easy as (groovy-ish pseudocode):

def histogram = new HashMap()
def maxObj = null
def maxObjCount = 0
objectList.each {
    if(histogram.contains(it)) histogram.put(it, histogram.get(it)+1)
    else histogram.put(it, 1)

    if(histogram.get(it) > maxObjCount) {
        maxObj = it
        maxObjCount = histogram.get(it)
    }
}

def count_reps(item, agg):
  k = hash(item)
  try:
    agg[k] += 1
  except KeyError:
    agg[k] = 1
  return agg

item_dict = reduce(your_array, {})

item_dict will contain the counts, then you can rate the popularity of each object.

Heres a different approach which essentially sorts the list and then processes it in a sorted order.

fruits = ['apple', 'orange', 'apple', 'banana', 'banana', 'orange', 'apple', 'apple']

max_fruit_count = 0
max_fruit = ''
current_fruit_count = 0
current_fruit = ''
for fruit in sorted(fruits) :
    if fruit != current_fruit :
        if current_fruit != max_fruit :
            if current_fruit_count > max_fruit_count :
                max_fruit = current_fruit
                max_fruit_count = current_fruit_count
        current_fruit = fruit
        current_fruit_count = 1
    else :
        current_fruit_count += 1

if current_fruit_count > max_fruit_count :
    max_fruit = current_fruit
    max_fruit_count = current_fruit_count

print max_fruit, max_fruit_count

This is not O(n), but O(n^2), so it not may fit your bill as "most efficient way", but it's compact and avoids for loops, which are rather slow in Python. It will be faster than the O(n) option up to 11 unique items.

def most_common(items):
    s = set(items)
    return max([(items.count(i), i) for i in s])[1]

As ~unutbu says: use collections.Counter Failing that, time your code. Here is my (likely inefficient) approach:

python -m timeit -s "fruit = ['apple']*4 + ['banana'] + ['orange']*2" \
"kL = set(fruit);  L = [fruit.count(f) for f in kL];  D = dict(zip(kL,L)); \
sorted(D,key = lambda k: D[k],reverse=True)" 
100000 loops, best of 3: 10.1 usec per loop

继续阅读：algorithm arrays list performance python

How to get the most represented object from an array

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？