开发者

Summarizing a dictionary of arrays in Python

I got the following dictionary:

mydict = {
  'foo': [1,19,2,3,24,52,2,6],          # sum: 109
  'bar': [50,5,9,7,66,3,2,44],          # sum: 186
  'another': [1,2,3,4,5,6,7,8],         # sum:  36
  'entry': [0,0,0,2,99,4,33,55],        # sum: 193
  'onemore': [21,22,23,24,25,26,27,28]  # sum: 196
}

I need to efficiently filter out and sort the top x entries by the sum of the array.

For example, the Top 3 sorted and filtered list for the example above would be

sorted_filtered_dict = {
  'onemore': [21,22,23,24,25,26,27,28], # sum: 196
  '开发者_StackOverflow中文版entry': [0,0,0,2,99,4,33,55],        # sum: 193
  'bar': [50,5,9,7,66,3,2,44]           # sum: 186
}

I'm fairly new to Python, and tried it myself with chaining a sum and filter function on a lambda function, but struggled with the actual syntax.


It's easy to do with a sort:

sorted(mydict.iteritems(), key=lambda tup: sum(tup[1]), reverse=True)[:3]

This is reasonable if the ratio is similar to this one (3 / 5). If it's larger, you'll want to avoid the sort (O(n log n)), since top 3 can be done in O(n). For instance, using heapq, the heap module:

heapq.nlargest(3, mydict.iteritems(), key=lambda tup: sum(tup[1]))

This is O(n + 3 log n), since assembly the initial heap is O(n) and re-heapifying is O(log n).

EDIT: If you're using Python 2.7 or later, you can easily convert to a OrderedDict (equivalent version for Python 2.4+):

OrderedDict(heapq.nlargest(3, mydict.iteritems(), key=lambda tup: sum(tup[1])))

OrderedDict has the same API as dict, but remembers insertion order.


For such a small slice it's not worth using islice

sorted(mydict.iteritems(), key=lambda (k,v): sum(v), reverse=True)[:3]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜