Summarizing a dictionary of arrays in Python
I got the following dictionary:
mydict = {
'foo': [1,19,2,3,24,52,2,6], # sum: 109
'bar': [50,5,9,7,66,3,2,44], # sum: 186
'another': [1,2,3,4,5,6,7,8], # sum: 36
'entry': [0,0,0,2,99,4,33,55], # sum: 193
'onemore': [21,22,23,24,25,26,27,28] # sum: 196
}
I need to efficiently filter out and sort the top x entries by the sum of the array.
For example, the Top 3 sorted and filtered list for the example above would be
sorted_filtered_dict = {
'onemore': [21,22,23,24,25,26,27,28], # sum: 196
'开发者_StackOverflow中文版entry': [0,0,0,2,99,4,33,55], # sum: 193
'bar': [50,5,9,7,66,3,2,44] # sum: 186
}
I'm fairly new to Python, and tried it myself with chaining a sum and filter function on a lambda function, but struggled with the actual syntax.
It's easy to do with a sort:
sorted(mydict.iteritems(), key=lambda tup: sum(tup[1]), reverse=True)[:3]
This is reasonable if the ratio is similar to this one (3 / 5). If it's larger, you'll want to avoid the sort (O(n log n)), since top 3 can be done in O(n). For instance, using heapq, the heap module:
heapq.nlargest(3, mydict.iteritems(), key=lambda tup: sum(tup[1]))
This is O(n + 3 log n), since assembly the initial heap is O(n) and re-heapifying is O(log n).
EDIT: If you're using Python 2.7 or later, you can easily convert to a OrderedDict
(equivalent version for Python 2.4+):
OrderedDict(heapq.nlargest(3, mydict.iteritems(), key=lambda tup: sum(tup[1])))
OrderedDict
has the same API as dict
, but remembers insertion order.
For such a small slice it's not worth using islice
sorted(mydict.iteritems(), key=lambda (k,v): sum(v), reverse=True)[:3]
精彩评论