Sorting a counted list in Python
(I am brand new to any kind of programming so please be as specific as you can when you answer) Problem: I have written a program to solve pythonchallenge.com level 2. The program works but the results are messy. I want to sort the results of the character count into a nice looking list. When I try to sort the results of the character count using sorted() it removes all the counts and just gives me a list of the characters that were in my string. I need to be able to keep the ability to see how much of each character was in my file. Anyway here is the code:
countstring = open('pagesource.txt').read()
charcount = {}
for x in countstring:
charcount[x] = charcount.get(x, 0) + 1
print charcount
this is what i get in cmd:
>>> {'\n': 1219, '!': 6079, '#': 6115, '%': 6104, '$': 6046, '&': 6043, ')': 6186, '
(': 6154, '+': 6066, '*': 6034, '@': 6157, '[': 6108, ']': 6152, '_': 6112, '^':
6030, 'a': 1, 'e': 1, 'i': 1, 'l': 1, 'q': 1, 'u': 1, 't': 1, 'y': 开发者_开发知识库1, '{': 6046
, '}': 6105}
if I add a sorted() function such as print sorted(charcount) to it I get this in cmd:
>>> ['\n', '!', '#', '$', '%', '&', '(', ')', '*', '+', '@', '[', ']', '^', '_', 'a'
, 'e', 'i', 'l', 'q', 't', 'u', 'y', '{', '}']
Thanks for your solutions and if you can take the time to add comments to your code explaining what everything does I would greatly appreciate it!
You should really use the Counter
class instead of reinventing your own wheel.
charcount
is a dictionary, and dictionaries have no implicit sort order. Therefore, we'll have to convert it to a list, which can be sorted. Each entry in that list will be a tuple of count and character.
charcount.items()
already gives us a list that looks like [('\n', 1219), ('!', 6079)]
. Unfortunately, if we would sort this list, it would sort by character first and then (if characters were ever equal) by count instead of the other way round. Therefore, we need a key function to tell sort to look at count first, and then (if counts are equal) the character. Fortunately, our key function is really simple; it just swaps around the tuple:
lambda (char,count): (count, char)
Alternatively, we could use a list comprehension to swap the values, to get something like: [('\n', 1219), ('!', 6079)]
, then sort, and then swap the values again.
charcount_list = sorted(charcount.items(), key=lambda (char,count):(count, char))
charcount_list will now be:
[('a', 1), ('e', 1), ('i', 1), ('l', 1), ('q', 1), ('t', 1), ('u', 1), ('y', 1),
('\n', 1219), ('^', 6030), ('*', 6034), ('&', 6043), ('$', 6046), ('{', 6046),
('+', 6066), ('!', 6079), ('%', 6104), ('}', 6105), ('[', 6108), ('_', 6112),
('#', 6115), (']', 6152), (' (', 6154), ('@', 6157), (')', 6186)]
If you want the reverse order, simply specify the reverse=True
argument to sorted.
>>> from operator import itemgetter
>>> sorted(charcount.items(), key=itemgetter(1))
[('a', 1), ('e', 1), ('i', 1), ('l', 1), ('q', 1), ('u', 1), ('t', 1), ('y', 1), ('\n', 1219), ('^', 6030), ('*', 6034), ('&', 6043), ('$', 6046), ('{', 6046), ('+', 6066), ('!', 6079), ('%', 6104), ('}', 6105), ('[', 6108), ('_', 6112), ('#', 6115), (']', 6152), (' (', 6154), ('@', 6157), (')', 6186)]
charcount
is a dict
(dictionary). Iterating a dictionary iterates over it's keys, that's why sorted()
results in a sorted list of keys.
You need to get list of items then sort it by the second value:
sorted(charcount.items(), key=lambda t: t[1])
Dictionaries ( what {} means) are unordered collections. Which means you can't sort them in any kind of meaningful way. I suggest storing the information as a list of tuples [(), ...] and then sorting them based on that.
foo = [('a', 123), ('b', 345)]
def key_function(x):
return x[1]
sorted_list = sorted(foo, key_function)
print sorted_list
As you can see, sorted takes an optional second parameter. The purpose of that parameter is to provide a function that tells sorted how to sort something. All you're doing is breaking down the information in each tuple in the list to provide a value that can be ordered, since you can't really order a list of tuples in any meaningful way.
Make sense?
It can also be written like: print sorted(foo, key=lambda (x,y): y)
lambda just means an inline function with no name, and it allows you to break down the tuple in a different way.
You can see how this works by doing print [y for (x,y) in sorted_list]
You can even redefine the key function from before like this:
def key_function(x):
x,y = x
return y
BTW, I only put in the parentheses before for clarity. If you're not defining a function then the comma is the tuple constructor.
sorted(charcount.items(), key=lambda item: item[1])
Dictionary is iterated by key, so you get a sorted list of keys when you pass the dictionary to sorted
. Sort the dictionary's item tuples by value to get a list of sorted tuples.
sorted_charcount = sorted(charcount.items(), key=lambda item: item[1])
If you're using Python 2.7+, then you can use the list of tuples to initialize an OrderedDict
, which will maintain the sorted order of item tuples.
精彩评论