How to find most common elements of a list? [duplicate]

2023-01-14 13:59 问答作者：

This question already has answers here: Find the item with maximum occurrences in a list [duplicate] (14 answers) Closed 2 years ago.

The community reviewed whether to reopen this question 9 months ago and left it closed:

开发者_如何学Python
Original close reason(s) were not resolved

Given the following list

['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 
 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 
 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 
 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 
 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 
 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 
 'Moon', 'to', 'rise.', '']

I am trying to count how many times each word appears and display the top 3.

However I am only looking to find the top three that have the first letter capitalized and ignore all words that do not have the first letter capitalized.

I am sure there is a better way than this, but my idea was to do the following:

put the first word in the list into another list called uniquewords
delete the first word and all its duplicated from the original list
add the new first word into unique words
delete the first word and all its duplicated from original list.
etc...
until the original list is empty....
count how many times each word in uniquewords appears in the original list
find top 3 and print

In Python 2.7 and above there is a class called Counter which can help you:

from collections import Counter
words_to_count = (word for word in word_list if word[:1].isupper())
c = Counter(words_to_count)
print c.most_common(3)

Result:

[('Jellicle', 6), ('Cats', 5), ('And', 2)]

I am quite new to programming so please try and do it in the most barebones fashion.

You could instead do this using a dictionary with the key being a word and the value being the count for that word. First iterate over the words adding them to the dictionary if they are not present, or else increasing the count for the word if it is present. Then to find the top three you can either use a simple O(n*log(n)) sorting algorithm and take the first three elements from the result, or you can use a O(n) algorithm that scans the list once remembering only the top three elements.

An important observation for beginners is that by using builtin classes that are designed for the purpose you can save yourself a lot of work and/or get better performance. It is good to be familiar with the standard library and the features it offers.

If you are using an earlier version of Python or you have a very good reason to roll your own word counter (I'd like to hear it!), you could try the following approach using a dict.

Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> word_list = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', '']
>>> word_counter = {}
>>> for word in word_list:
...     if word in word_counter:
...         word_counter[word] += 1
...     else:
...         word_counter[word] = 1
... 
>>> popular_words = sorted(word_counter, key = word_counter.get, reverse = True)
>>> 
>>> top_3 = popular_words[:3]
>>> 
>>> top_3
['Jellicle', 'Cats', 'and']

Top Tip: The interactive Python interpretor is your friend whenever you want to play with an algorithm like this. Just type it in and watch it go, inspecting elements along the way.

To just return a list containing the most common words:

from collections import Counter
words=["i", "love", "you", "i", "you", "a", "are", "you", "you", "fine", "green"]
most_common_words= [word for word, word_count in Counter(words).most_common(3)]
print most_common_words

this prints:

['you', 'i', 'a']

the 3 in "most_common(3)", specifies the number of items to print. Counter(words).most_common() returns a a list of tuples with each tuple having the word as the first member and the frequency as the second member.The tuples are ordered by the frequency of the word.

`most_common = [item for item in Counter(words).most_common()]
print(str(most_common))
[('you', 4), ('i', 2), ('a', 1), ('are', 1), ('green', 1), ('love',1), ('fine', 1)]`

"the word for word, word_counter in", extracts only the first member of the tuple.

Is't it just this ....

word_list=['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 
 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 
 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 
 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 
 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 
 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 
 'Moon', 'to', 'rise.', ''] 

from collections import Counter
c = Counter(word_list)
c.most_common(3)

Which should output

[('Jellicle', 6), ('Cats', 5), ('are', 3)]

nltk is convenient for a lot of language processing stuff. It has methods for frequency distribution built in. Something like:

import nltk
fdist = nltk.FreqDist(your_list) # creates a frequency distribution from a list
most_common = fdist.max()    # returns a single element
top_three = fdist.keys()[:3] # returns a list

A simple, two-line solution to this, which does not require any extra modules is the following code:

lst = ['Jellicle', 'Cats', 'are', 'black', 'and','white,',
       'Jellicle', 'Cats','are', 'rather', 'small;', 'Jellicle', 
       'Cats', 'are', 'merry', 'and','bright,', 'And', 'pleasant',    
       'to','hear', 'when', 'they', 'caterwaul.','Jellicle', 
       'Cats', 'have','cheerful', 'faces,', 'Jellicle',
       'Cats','have', 'bright', 'black','eyes;', 'They', 'like',
       'to', 'practise','their', 'airs', 'and', 'graces', 'And', 
       'wait', 'for', 'the', 'Jellicle','Moon', 'to', 'rise.', '']

lst_sorted=sorted([ss for ss in set(lst) if len(ss)>0 and ss.istitle()], 
                   key=lst.count, 
                   reverse=True)
print lst_sorted[0:3]

Output:

['Jellicle', 'Cats', 'And']

The term in squared brackets returns all unique strings in the list, which are not empty and start with a capital letter. The sorted() function then sorts them by how often they appear in the list (by using the lst.count key) in reverse order.

There's two standard library ways to find the most frequent value in a list:

statistics.mode:

from statistics import mode
most_common = mode([3, 2, 2, 2, 1, 1])  # 2
most_common = mode([3, 2])  # StatisticsError: no unique mode

Raises an exception if there's no unique most frequent value
Only returns single most frequent value

collections.Counter.most_common:

from collections import Counter
most_common, count = Counter([3, 2, 2, 2, 1, 1]).most_common(1)[0]  # 2, 3
(most_common_1, count_1), (most_common_2, count_2) = Counter([3, 2, 2]).most_common(2)  # (2, 2), (3, 1)

Can return multiple most frequent values
Returns element count as well

So in the case of the question, the second one would be the right choice. As a side note, both are identical in terms of performance.

The simple way of doing this would be (assuming your list is in 'l'):

>>> counter = {}
>>> for i in l: counter[i] = counter.get(i, 0) + 1
>>> sorted([ (freq,word) for word, freq in counter.items() ], reverse=True)[:3]
[(6, 'Jellicle'), (5, 'Cats'), (3, 'to')]

Complete sample:

>>> l = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', '']
>>> counter = {}
>>> for i in l: counter[i] = counter.get(i, 0) + 1
... 
>>> counter
{'and': 3, '': 1, 'merry': 1, 'rise.': 1, 'small;': 1, 'Moon': 1, 'cheerful': 1, 'bright': 1, 'Cats': 5, 'are': 3, 'have': 2, 'bright,': 1, 'for': 1, 'their': 1, 'rather': 1, 'when': 1, 'to': 3, 'airs': 1, 'black': 2, 'They': 1, 'practise': 1, 'caterwaul.': 1, 'pleasant': 1, 'hear': 1, 'they': 1, 'white,': 1, 'wait': 1, 'And': 2, 'like': 1, 'Jellicle': 6, 'eyes;': 1, 'the': 1, 'faces,': 1, 'graces': 1}
>>> sorted([ (freq,word) for word, freq in counter.items() ], reverse=True)[:3]
[(6, 'Jellicle'), (5, 'Cats'), (3, 'to')]

With simple I mean working in nearly every version of python.

if you don't understand some of the functions used in this sample, you can always do this in the interpreter (after pasting the code above):

>>> help(counter.get)
>>> help(sorted)

The answer from @Mark Byers is best, but if you are on a version of Python < 2.7 (but at least 2.5, which is pretty old these days), you can replicate the Counter class functionality very simply via defaultdict (otherwise, for python < 2.5, three extra lines of code are needed before d[i] +=1, as in @Johnnysweb's answer).

from collections import defaultdict
class Counter():
    ITEMS = []
    def __init__(self, items):
        d = defaultdict(int)
        for i in items:
            d[i] += 1
        self.ITEMS = sorted(d.iteritems(), reverse=True, key=lambda i: i[1])
    def most_common(self, n):
        return self.ITEMS[:n]

Then, you use the class exactly as in Mark Byers's answer, i.e.:

words_to_count = (word for word in word_list if word[:1].isupper())
c = Counter(words_to_count)
print c.most_common(3)

I will like to answer this with numpy, great powerful array computation module in python.

Here is code snippet:

import numpy
a = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 
 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 
 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 
 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 
 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 
 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 
 'Moon', 'to', 'rise.', '']
dict(zip(*numpy.unique(a, return_counts=True)))

Output

{'': 1, 'And': 2, 'Cats': 5, 'Jellicle': 6, 'Moon': 1, 'They': 1, 'airs': 1, 'and': 3, 'are': 3, 'black': 2, 'bright': 1, 'bright,': 1, 'caterwaul.': 1, 'cheerful': 1, 'eyes;': 1, 'faces,': 1, 'for': 1, 'graces': 1, 'have': 2, 'hear': 1, 'like': 1, 'merry': 1, 'pleasant': 1, 'practise': 1, 'rather': 1, 'rise.': 1, 'small;': 1, 'the': 1, 'their': 1, 'they': 1, 'to': 3, 'wait': 1, 'when': 1, 'white,': 1}

Output is in dictionary object in format of (key, value) pairs, where value is count of particular word

This answer is inspire by another answer on stackoverflow, you can view it here

If you are using Count, or have created your own Count-style dict and want to show the name of the item and the count of it, you can iterate around the dictionary like so:

top_10_words = Counter(my_long_list_of_words)
# Iterate around the dictionary
for word in top_10_words:
        # print the word
        print word[0]
        # print the count
        print word[1]

or to iterate through this in a template:

{% for word in top_10_words %}
        <p>Word: {{ word.0 }}</p>
        <p>Count: {{ word.1 }}</p>
{% endfor %}

Hope this helps someone

继续阅读：frequency list python

How to find most common elements of a list? [duplicate]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？