In Python, how do you conserve grouping when you sort by one value and then another?

2023-03-11 22:12 问答作者：

Data looks like this:

Idx score group

5 0.85 Europe

8 0.77 Australia

12 0.70 S.America

13 0.71 Australia

42 0.82 Europe

45 0.90 Asia

65 0.91 Asia

73 开发者_JAVA百科 0.72 S.America

77 0.84 Asia

Needs to look like this:

Idx score group

65 0.91 Asia

77 0.84 Asia

45 0.73 Asia

12 0.87 S.America

73 0.72 S.America

5 0.85 Europe

42 0.82 Europe

8 0.83 Australia

13 0.71 Australia

See how Asia has the highest score and it shows me all of Asia's scores, then it's followed by the group which has the 2nd highest score and so on? I need to do this in Python. It's very different than sorting by one element and then sorting by another. Please help. Sorry if this question is redundant. I barely know how to ask it, let alone search for it.

I had it as a dictionary so that dict = {5:[0.85,Europe],8:[0.77,Australia]...} And I made a function that tried to parse the data:

def sortResults(dict):
   newDict = {}
   for k,v in dict.items():
      if v[-1] in newDict:
         sorDic[v[-1]].append((k,float(v[0]),v[1]))
      else:
         newDict[v[-1]] = [(k,float(v[0]),v[1])]
   for k in newDict.keys():
      for resList in newDict[k]:
         resList = sorted(resList,key=itemgetter(1),reverse=True)
   return sorDic

It says the float is unsubscriptable...I'm just confused.

I would just populate a dictionary with the maximum per group, and then sort on group maximum followed by individual score. Like this:

data = [
  (5 , 0.85, "Europe"),
  (8 , 0.77, "Australia"),
  (12, 0.70, "S.America"),
  (13, 0.71, "Australia"),
  (42, 0.82, "Europe"),
  (45, 0.90, "Asia"),
  (65, 0.91, "Asia"),
  (73, 0.72, "S.America"),
  (77, 0.84, "Asia")
]

maximums_by_group = dict()

for indx, score, group in data:
    if group not in maximums_by_group or maximums_by_group[group] < score:
        maximums_by_group[group] = score

data.sort(key=lambda e: (maximums_by_group[e[2]], e[1]), reverse=True)

for indx, score, group in data:
    print indx, score, group

This produces the expected output of

65 0.91 Asia
77 0.84 Asia
45 0.73 Asia
12 0.87 S.America
73 0.72 S.America
5 0.85 Europe
42 0.82 Europe
8 0.83 Australia
13 0.71 Australia

I think there's a better way to iterate than what i have here, but this works:

from operator import itemgetter

dataset = [
    { 'idx': 5, 'score': 0.85, 'group': 'Europe' },
    { 'idx': 8, 'score': 0.77, 'group': 'Australia' },
    { 'idx': 12, 'score': 0.70, 'group': 'S.America' },
    { 'idx': 13, 'score': 0.71, 'group': 'Australia' },
    { 'idx': 42, 'score': 0.82, 'group': 'Europe' },
    { 'idx': 45, 'score': 0.90, 'group': 'Asia' },
    { 'idx': 65, 'score': 0.91, 'group': 'Asia' },
    { 'idx': 73, 'score': 0.72, 'group': 'S.America' }
]

score_sorted = sorted(dataset, key=itemgetter('score'), reverse=True)

group_score_sorted = []
groups_completed = []
for score in score_sorted:
    group_name = score['group']
    if not group_name in groups_completed:
        groups_completed.append(group_name)

        for group in score_sorted:
            if group['group'] = group_name:
                group_score_sorted.append(group)

#group_score_sorted now contains sorted list

I think the easiest way is to separate first by groups and then doing the sort in two steps (first sort on max of group, second sort on score inside group).

data = [[ 5, 0.85, "Europe"],
        [ 8, 0.77, "Australia"],
        [12, 0.70, "S.America"],
        [13, 0.71, "Australia"],
        [42, 0.82, "Europe"],
        [45, 0.90, "Asia"],
        [65, 0.91, "Asia"],
        [73, 0.72, "S.America"],
        [77, 0.84, "Asia"]]

groups = {}
for idx, score, group in data:
    try:
        groups[group].append((idx, score, group))
    except KeyError:
        groups[group] = [(idx, score, group)]

for group in sorted((group for group in groups.keys()),
                    key = lambda g : -max(x[1] for x in groups[g])):
    for idx, score, group in sorted(groups[group], key = lambda g : -g[1]):
        print idx, score, group

The final result is

65 0.91 Asia
45 0.9  Asia
77 0.84 Asia
 5 0.85 Europe
42 0.82 Europe
 8 0.77 Australia
13 0.71 Australia
73 0.72 S.America
12 0.7  S.America

that is different from what you provided, but for the results in your question I think you have a typo because the score 0.87 for S.America is not present anywhwere in the input data.

The easiest way to do this is to dump the data into a list, because python dictionaries are unsorted. Then use the native timsort algorithm in python, which keeps runs or groupings during sorts.

So your code would be something like this:

data = [[ 5, 0.85, "Europe"],
        [ 8, 0.77, "Australia"],
        [12, 0.70, "S.America"],
        [13, 0.71, "Australia"],
        [42, 0.82, "Europe"],
        [45, 0.90, "Asia"],
        [65, 0.91, "Asia"],
        [73, 0.72, "S.America"],
        [77, 0.84, "Asia"]]

data.sort(key=lambda x: x[1], reverse=True)
data.sort(key=lambda x: x[2].upper())

This will produce:

[65, 0.91, 'Asia']
[45, 0.90, 'Asia']
[77, 0.84, 'Asia']
[8, 0.77, 'Australia']
[13, 0.71, 'Australia']
[5, 0.85, 'Europe']
[42, 0.82, 'Europe']
[73, 0.72, 'S.America']
[12, 0.70, 'S.America']

I like itertools and operator:

from itertools import groupby, imap
from operator import itemgetter

def sort_by_max(a_list):
    index, score, group = imap(itemgetter, xrange(3))
    a_list.sort(key=group)
    max_index = dict(
        (each, max(imap(index, entries)))
            for each, entries in groupby(a_list, group)
    )
    a_list.sort(key=lambda x:(-max_index[group(x)], -score(x)))

Used like this:

the_list = [
    [5, 0.85, 'Europe'],
    [8, 0.77, 'Australia'],
    [12, 0.87, 'S.America'],
    [13, 0.71, 'Australia'],
    [42, 0.82, 'Europe'],
    [45, 0.90, 'Asia'],
    [65, 0.91, 'Asia'],
    [73, 0.72, 'S.America'],
    [77, 0.84, 'Asia']
]
sort_by_max(the_list)
for each in the_list:
    print '{0:2} : {1:<4} : {2}'.format(*each)

gives:

65 : 0.91 : Asia
45 : 0.9  : Asia
77 : 0.84 : Asia
12 : 0.87 : S.America
73 : 0.72 : S.America
 5 : 0.85 : Europe
42 : 0.82 : Europe
 8 : 0.77 : Australia
13 : 0.71 : Australia

[EDIT]

Come to think of it, I also like defaultdict and max:

from collections import defaultdict

def sort_by_max(a_list):
    max_index = defaultdict(int)
    for index, score, group in a_list:
        max_index[group] = max(index, max_index[group])
    a_list.sort(key=lambda (index, score, group):(-max_index[group], -score))

继续阅读：python sorteddictionary sortedlist sorting

In Python, how do you conserve grouping when you sort by one value and then another?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？