In Python, how do you conserve grouping when you sort by one value and then another?
Data looks like this:
Idx score group
5 0.85 Europe 8 0.77 Australia 12 0.70 S.America 13 0.71 Australia 42 0.82 Europe 45 0.90 Asia 65 0.91 Asia 73 开发者_JAVA百科 0.72 S.America 77 0.84 AsiaNeeds to look like this:
Idx score group
65 0.91 Asia 77 0.84 Asia 45 0.73 Asia 12 0.87 S.America 73 0.72 S.America 5 0.85 Europe 42 0.82 Europe 8 0.83 Australia 13 0.71 AustraliaSee how Asia has the highest score and it shows me all of Asia's scores, then it's followed by the group which has the 2nd highest score and so on? I need to do this in Python. It's very different than sorting by one element and then sorting by another. Please help. Sorry if this question is redundant. I barely know how to ask it, let alone search for it.
I had it as a dictionary so that dict = {5:[0.85,Europe],8:[0.77,Australia]...} And I made a function that tried to parse the data:
def sortResults(dict):
newDict = {}
for k,v in dict.items():
if v[-1] in newDict:
sorDic[v[-1]].append((k,float(v[0]),v[1]))
else:
newDict[v[-1]] = [(k,float(v[0]),v[1])]
for k in newDict.keys():
for resList in newDict[k]:
resList = sorted(resList,key=itemgetter(1),reverse=True)
return sorDic
It says the float is unsubscriptable...I'm just confused.
I would just populate a dictionary with the maximum per group, and then sort on group maximum followed by individual score. Like this:
data = [
(5 , 0.85, "Europe"),
(8 , 0.77, "Australia"),
(12, 0.70, "S.America"),
(13, 0.71, "Australia"),
(42, 0.82, "Europe"),
(45, 0.90, "Asia"),
(65, 0.91, "Asia"),
(73, 0.72, "S.America"),
(77, 0.84, "Asia")
]
maximums_by_group = dict()
for indx, score, group in data:
if group not in maximums_by_group or maximums_by_group[group] < score:
maximums_by_group[group] = score
data.sort(key=lambda e: (maximums_by_group[e[2]], e[1]), reverse=True)
for indx, score, group in data:
print indx, score, group
This produces the expected output of
65 0.91 Asia
77 0.84 Asia
45 0.73 Asia
12 0.87 S.America
73 0.72 S.America
5 0.85 Europe
42 0.82 Europe
8 0.83 Australia
13 0.71 Australia
I think there's a better way to iterate than what i have here, but this works:
from operator import itemgetter
dataset = [
{ 'idx': 5, 'score': 0.85, 'group': 'Europe' },
{ 'idx': 8, 'score': 0.77, 'group': 'Australia' },
{ 'idx': 12, 'score': 0.70, 'group': 'S.America' },
{ 'idx': 13, 'score': 0.71, 'group': 'Australia' },
{ 'idx': 42, 'score': 0.82, 'group': 'Europe' },
{ 'idx': 45, 'score': 0.90, 'group': 'Asia' },
{ 'idx': 65, 'score': 0.91, 'group': 'Asia' },
{ 'idx': 73, 'score': 0.72, 'group': 'S.America' }
]
score_sorted = sorted(dataset, key=itemgetter('score'), reverse=True)
group_score_sorted = []
groups_completed = []
for score in score_sorted:
group_name = score['group']
if not group_name in groups_completed:
groups_completed.append(group_name)
for group in score_sorted:
if group['group'] = group_name:
group_score_sorted.append(group)
#group_score_sorted now contains sorted list
I think the easiest way is to separate first by groups and then doing the sort in two steps (first sort on max of group, second sort on score inside group).
data = [[ 5, 0.85, "Europe"],
[ 8, 0.77, "Australia"],
[12, 0.70, "S.America"],
[13, 0.71, "Australia"],
[42, 0.82, "Europe"],
[45, 0.90, "Asia"],
[65, 0.91, "Asia"],
[73, 0.72, "S.America"],
[77, 0.84, "Asia"]]
groups = {}
for idx, score, group in data:
try:
groups[group].append((idx, score, group))
except KeyError:
groups[group] = [(idx, score, group)]
for group in sorted((group for group in groups.keys()),
key = lambda g : -max(x[1] for x in groups[g])):
for idx, score, group in sorted(groups[group], key = lambda g : -g[1]):
print idx, score, group
The final result is
65 0.91 Asia
45 0.9 Asia
77 0.84 Asia
5 0.85 Europe
42 0.82 Europe
8 0.77 Australia
13 0.71 Australia
73 0.72 S.America
12 0.7 S.America
that is different from what you provided, but for the results in your question I think you have a typo because the score 0.87
for S.America
is not present anywhwere in the input data.
The easiest way to do this is to dump the data into a list, because python dictionaries are unsorted. Then use the native timsort algorithm in python, which keeps runs or groupings during sorts.
So your code would be something like this:
data = [[ 5, 0.85, "Europe"],
[ 8, 0.77, "Australia"],
[12, 0.70, "S.America"],
[13, 0.71, "Australia"],
[42, 0.82, "Europe"],
[45, 0.90, "Asia"],
[65, 0.91, "Asia"],
[73, 0.72, "S.America"],
[77, 0.84, "Asia"]]
data.sort(key=lambda x: x[1], reverse=True)
data.sort(key=lambda x: x[2].upper())
This will produce:
[65, 0.91, 'Asia']
[45, 0.90, 'Asia']
[77, 0.84, 'Asia']
[8, 0.77, 'Australia']
[13, 0.71, 'Australia']
[5, 0.85, 'Europe']
[42, 0.82, 'Europe']
[73, 0.72, 'S.America']
[12, 0.70, 'S.America']
I like itertools and operator:
from itertools import groupby, imap
from operator import itemgetter
def sort_by_max(a_list):
index, score, group = imap(itemgetter, xrange(3))
a_list.sort(key=group)
max_index = dict(
(each, max(imap(index, entries)))
for each, entries in groupby(a_list, group)
)
a_list.sort(key=lambda x:(-max_index[group(x)], -score(x)))
Used like this:
the_list = [
[5, 0.85, 'Europe'],
[8, 0.77, 'Australia'],
[12, 0.87, 'S.America'],
[13, 0.71, 'Australia'],
[42, 0.82, 'Europe'],
[45, 0.90, 'Asia'],
[65, 0.91, 'Asia'],
[73, 0.72, 'S.America'],
[77, 0.84, 'Asia']
]
sort_by_max(the_list)
for each in the_list:
print '{0:2} : {1:<4} : {2}'.format(*each)
gives:
65 : 0.91 : Asia
45 : 0.9 : Asia
77 : 0.84 : Asia
12 : 0.87 : S.America
73 : 0.72 : S.America
5 : 0.85 : Europe
42 : 0.82 : Europe
8 : 0.77 : Australia
13 : 0.71 : Australia
[EDIT]
Come to think of it, I also like defaultdict
and max
:
from collections import defaultdict
def sort_by_max(a_list):
max_index = defaultdict(int)
for index, score, group in a_list:
max_index[group] = max(index, max_index[group])
a_list.sort(key=lambda (index, score, group):(-max_index[group], -score))
精彩评论