开发者

Sorting a counted list in Python

(I am brand new to any kind of programming so please be as specific as you can when you answer) Problem: I have written a program to solve pythonchallenge.com level 2. The program works but the results are messy. I want to sort the results of the character count into a nice looking list. When I try to sort the results of the character count using sorted() it removes all the counts and just gives me a list of the characters that were in my string. I need to be able to keep the ability to see how much of each character was in my file. Anyway here is the code:

countstring = open('pagesource.txt').read()

charcount = {}

for x in countstring:
    charcount[x] = charcount.get(x, 0) + 1

print charcount

this is what i get in cmd:

>>> {'\n': 1219, '!': 6079, '#': 6115, '%': 6104, '$': 6046, '&': 6043, ')': 6186, '
(': 6154, '+': 6066, '*': 6034, '@': 6157, '[': 6108, ']': 6152, '_': 6112, '^':
 6030, 'a': 1, 'e': 1, 'i': 1, 'l': 1, 'q': 1, 'u': 1, 't': 1, 'y': 开发者_开发知识库1, '{': 6046
, '}': 6105}

if I add a sorted() function such as print sorted(charcount) to it I get this in cmd:

>>> ['\n', '!', '#', '$', '%', '&', '(', ')', '*', '+', '@', '[', ']', '^', '_', 'a'
, 'e', 'i', 'l', 'q', 't', 'u', 'y', '{', '}']

Thanks for your solutions and if you can take the time to add comments to your code explaining what everything does I would greatly appreciate it!


You should really use the Counter class instead of reinventing your own wheel.

charcount is a dictionary, and dictionaries have no implicit sort order. Therefore, we'll have to convert it to a list, which can be sorted. Each entry in that list will be a tuple of count and character.

charcount.items() already gives us a list that looks like [('\n', 1219), ('!', 6079)]. Unfortunately, if we would sort this list, it would sort by character first and then (if characters were ever equal) by count instead of the other way round. Therefore, we need a key function to tell sort to look at count first, and then (if counts are equal) the character. Fortunately, our key function is really simple; it just swaps around the tuple:

lambda (char,count): (count, char)

Alternatively, we could use a list comprehension to swap the values, to get something like: [('\n', 1219), ('!', 6079)], then sort, and then swap the values again.

charcount_list = sorted(charcount.items(), key=lambda (char,count):(count, char))

charcount_list will now be:

[('a', 1), ('e', 1), ('i', 1), ('l', 1), ('q', 1), ('t', 1), ('u', 1), ('y', 1),
 ('\n', 1219), ('^', 6030), ('*', 6034), ('&', 6043), ('$', 6046), ('{', 6046),
 ('+', 6066), ('!', 6079), ('%', 6104), ('}', 6105), ('[', 6108), ('_', 6112),
 ('#', 6115), (']', 6152), (' (', 6154), ('@', 6157), (')', 6186)]

If you want the reverse order, simply specify the reverse=True argument to sorted.


>>> from operator import itemgetter
>>> sorted(charcount.items(), key=itemgetter(1))
[('a', 1), ('e', 1), ('i', 1), ('l', 1), ('q', 1), ('u', 1), ('t', 1), ('y', 1), ('\n', 1219), ('^', 6030), ('*', 6034), ('&', 6043), ('$', 6046), ('{', 6046), ('+', 6066), ('!', 6079), ('%', 6104), ('}', 6105), ('[', 6108), ('_', 6112), ('#', 6115), (']', 6152), (' (', 6154), ('@', 6157), (')', 6186)]


charcount is a dict (dictionary). Iterating a dictionary iterates over it's keys, that's why sorted() results in a sorted list of keys.

You need to get list of items then sort it by the second value:

sorted(charcount.items(), key=lambda t: t[1])


Dictionaries ( what {} means) are unordered collections. Which means you can't sort them in any kind of meaningful way. I suggest storing the information as a list of tuples [(), ...] and then sorting them based on that.

foo = [('a', 123), ('b', 345)]

def key_function(x):
    return x[1]

sorted_list = sorted(foo, key_function)
print sorted_list

As you can see, sorted takes an optional second parameter. The purpose of that parameter is to provide a function that tells sorted how to sort something. All you're doing is breaking down the information in each tuple in the list to provide a value that can be ordered, since you can't really order a list of tuples in any meaningful way.

Make sense?

It can also be written like: print sorted(foo, key=lambda (x,y): y)

lambda just means an inline function with no name, and it allows you to break down the tuple in a different way.

You can see how this works by doing print [y for (x,y) in sorted_list]

You can even redefine the key function from before like this:

def key_function(x):
    x,y = x
    return y

BTW, I only put in the parentheses before for clarity. If you're not defining a function then the comma is the tuple constructor.


sorted(charcount.items(), key=lambda item: item[1])


Dictionary is iterated by key, so you get a sorted list of keys when you pass the dictionary to sorted. Sort the dictionary's item tuples by value to get a list of sorted tuples.

sorted_charcount = sorted(charcount.items(), key=lambda item: item[1])

If you're using Python 2.7+, then you can use the list of tuples to initialize an OrderedDict, which will maintain the sorted order of item tuples.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜