Python: merging tally data
Okay - I'm sure this has been answered here before but I can't find it....
My problem: I have a list of lists with this composition
0.2 A
0.1 A
0.3 A
0.3 B
0.2 C
0.5 C
My goal is to output the following:
0.6 A
0.3 B
开发者_如何学Python0.7 C
In other words, I need to merge the data from multiple lines together.
Here's the code I'm using:
unique_percents = []
for line in percents:
new_percent = float(line[0])
for inner_line in percents:
if line[1] == inner_line[1]:
new_percent += float(inner_line[0])
else:
temp = []
temp.append(new_percent)
temp.append(line[1])
unique_percents.append(temp)
break
I think it should work, but it's not adding the percents up and still has the duplicates. Perhaps I'm not understanding how "break" works?
I'll also take suggestions of a better loop structure or algorithm to use. Thanks, David.
You want to use a dict, but collections.defaultdict
can come in really handy here so that you don't have to worry about whether the key exists in the dict or not -- it just defaults to 0.0:
import collections
lines = [[0.2, 'A'], [0.1, 'A'], [0.3, 'A'], [0.3, 'B'], [0.2, 'C'], [0.5, 'C']]
amounts = collections.defaultdict(float)
for amount, letter in lines:
amounts[letter] += amount
for letter, amount in sorted(amounts.iteritems()):
print amount, letter
Try this out:
result = {}
for line in percents:
value, key = line
result[key] = result.get(key, 0) + float(value)
total = {}
data = [('0.1', 'A'), ('0.2', 'A'), ('.3', 'B'), ('.4', 'B'), ('-10', 'C')]
for amount, key in data:
total[key] = total.get(key, 0.0) + float(amount)
for key, amount in total.items():
print key, amount
Since all of the letter grades are grouped together, you can use itertools.groupby (and if not, just sort the list ahead of time to make them so):
data = [
[0.2, 'A'],
[0.1, 'A'],
[0.3, 'A'],
[0.3, 'B'],
[0.2, 'C'],
[0.5, 'C'],
]
from itertools import groupby
summary = dict((k, sum(i[0] for i in items))
for k,items in groupby(data, key=lambda x:x[1]))
print summary
Gives:
{'A': 0.60000000000000009, 'C': 0.69999999999999996, 'B': 0.29999999999999999}
If you have a list of lists like this:
[ [0.2, A], [0.1, A], ...]
(in fact it looks like a list of tuples :)
res_dict = {}
for pair in lst:
letter = pair[1]
val = pair[0]
try:
res_dict[letter] += val
except KeyError:
res_dict[letter] = val
res_lst = [(val, letter) for letter, val in res_dict] # note, a list of tuples!
Using collections.defaultdict
to tally values
(assuming text data in d
):
>>> s=collections.defaultdict(float)
>>> for ln in d:
... v,k=ln.split()
... s[k] += float(v)
>>> s
defaultdict(<type 'float'>, {'A': 0.60000000000000009, 'C': 0.69999999999999996, 'B': 0.29999999999999999})
>>> ["%s %s" % (v,k) for k,v in s.iteritems()]
['0.6 A', '0.7 C', '0.3 B']
>>>
If you are using Python 3.1 or newer, you can use collections.Counter. Also I suggest using decimal.Decimal instead of floats:
# Counter requires python 3.1 and newer
from collections import Counter
from decimal import Decimal
lines = ["0.2 A", "0.1 A", "0.3 A", "0.3 B", "0.2 C", "0.5 C"]
results = Counter()
for line in lines:
percent, label = line.split()
results[label] += Decimal(percent)
print(results)
The result is:
Counter({'C': Decimal('0.7'), 'A': Decimal('0.6'), 'B': Decimal('0.3')})
This is verbose, but works:
# Python 2.7
lines = """0.2 A
0.1 A
0.3 A
0.3 B
0.2 C
0.5 C"""
lines = lines.split('\n')
#print(lines)
pctg2total = {}
thing2index = {}
index = 0
for line in lines:
pctg, thing = line.split()
pctg = float(pctg)
if thing not in thing2index:
thing2index[thing] = index
index = index + 1
pctg2total[thing] = pctg
else:
pctg2total[thing] = pctg2total[thing] + pctg
output = ((pctg2total[thing], thing) for thing in pctg2total)
# Let's sort by the first occurrence.
output = list(sorted(output, key = lambda thing: thing2index[thing[1]]))
print(output)
>>>
[(0.60000000000000009, 'A'), (0.29999999999999999, 'B'), (0.69999999999999996, 'C')]
letters = {}
for line in open("data", "r"):
lineStrip = line.strip().split()
percent = float(lineStrip[0])
letter = lineStrip[1]
if letter in letters:
letters[letter] = percent + letters[letter]
else:
letters[letter] = percent
for letter, percent in letters.items():
print letter, percent
A 0.6
C 0.7
B 0.3
Lets say we have this
data =[(b, float(a)) for a,b in
(line.split() for line in
"""
0.2 A
0.1 A
0.3 A
0.3 B
0.2 C
0.5 C""".splitlines()
if line)]
print data
# [('A', 0.2), ('A', 0.1), ('A', 0.3), ('B', 0.3), ('C', 0.2), ('C', 0.5)]
You can now just go though this and sum
counter = {}
for letter, val in data:
if letter in counter:
counter[letter]+=val
else:
counter[letter]=val
print counter.items()
Or group values together and use sum:
from itertools import groupby
# you want the name and the sum of the values
print [(name, sum(value for k,value in grp))
# from each group
for name, grp in
# where the group name of a item `p` is given by `p[0]`
groupby(sorted(data), key=lambda p:p[0])]
>>> from itertools import groupby, imap
>>> from operator import itemgetter
>>> data = [['0.2', 'A'], ['0.1', 'A'], ['0.3', 'A'], ['0.3', 'B'], ['0.2', 'C'], ['0.5', 'C']]
>>> # data = sorted(data, key=itemgetter(1))
...
>>> for k, g in groupby(data, key=itemgetter(1)):
... print sum(imap(float, imap(itemgetter(0), g))), k
...
0.6 A
0.3 B
0.7 C
>>>
精彩评论