开发者

Best pythonic way to populate the list containing the date type data?

I have the following list data.

data = [['2009-01-20', 3000.0], ['2011-03-01', 6000.0], ['2008-12-15',
6000.0], ['2002-02-15', 6000.0], ['2009-04-20', 6000.0], ['2010-08-01',
4170.0], ['2002-07-15', 6000.0], ['2008-08-15', 6000.0], ['2010-12-01',
6000.0], ['2011-02-01', 8107.0], ['2011-04-01', 8400.0], ['2011-05-15',
9000.0], ['2010-05-01', 6960.0], ['2005-12-15', 6000.0], ['2010-10-01',
6263.0], ['2011-06-02', 3000.0], ['2010-11-01', 4170.0], ['2009-09-25',
6000.0]]

where the first argument is date & second argument is total. i want result using group by month & year from the above list.

i.e result would like:

--> for month: [['JAN',tot1],['FEB',tot2],['MAR',tot3] ...]
--> for year: [['2002',tot1],['2005',tot2],['2008',开发者_如何学Pythontot3] ...]


from collections import defaultdict

yeartotal = defaultdict(float)
monthtotal = defaultdict(float)
for s in data:
    d = s[0].split('-')
    yeartotal[d[0]] += s[1]
    monthtotal[d[1]] += s[1]


In [37]: [item for item in yeartotal.iteritems()]
Out[37]: 
[('2002', 12000.0),
 ('2005', 6000.0),
 ('2008', 12000.0),
 ('2009', 15000.0),
 ('2011', 34507.0),
 ('2010', 27563.0)]

In [38]: [item for item in monthtotal.iteritems()]
Out[38]: 
[('02', 14107.0),
 ('03', 6000.0),
 ('12', 18000.0),
 ('06', 3000.0),
 ('07', 6000.0),
 ('04', 14400.0),
 ('05', 15960.0),
 ('08', 10170.0),
 ('09', 6000.0),
 ('01', 3000.0),
 ('11', 4170.0),
 ('10', 6263.0)]


First, lets transform the data into a more convenient form. We'll use the datetime module to handle those dates.

>>> trans = lambda row: (datetime.datetime.strptime(row[0], "%Y-%m-%d"), row[1])
>>> tdata = map(trans, data)

Next, a function (one each for the two group operations) that sums the value into a dict with the corresponding group.

>>> def mker(left, right):
...     result = dict(left)
...     mo = right[0].strftime('%b')
...     result[mo] = right[1] + left.get(mo, 0)
...     return result
... 
>>> def yker(left, right):
...     result = dict(left)
...     mo = right[0].strftime('%Y')
...     result[mo] = right[1] + left.get(mo, 0)
...     return result
... 

Finally, we apply these kernel functions to the data using reduce()

>>> reduce(mker, tdata, {})
{'Apr': 14400.0,
 'Aug': 10170.0,
 'Dec': 18000.0,
 'Feb': 14107.0,
 'Jan': 3000.0,
 'Jul': 6000.0,
 'Jun': 3000.0,
 'Mar': 6000.0,
 'May': 15960.0,
 'Nov': 4170.0,
 'Oct': 6263.0,
 'Sep': 6000.0}
>>> reduce(yker, tdata, {})
{'2002': 12000.0,
 '2005': 6000.0,
 '2008': 12000.0,
 '2009': 15000.0,
 '2010': 27563.0,
 '2011': 34507.0}


riffing on Steve's answer:

>>> data = [['2009-01-20', 3000.0], ['2011-03-01', 6000.0], ['2008-12-15',
... 6000.0], ['2002-02-15', 6000.0], ['2009-04-20', 6000.0], ['2010-08-01',
... 4170.0], ['2002-07-15', 6000.0], ['2008-08-15', 6000.0], ['2010-12-01',
... 6000.0], ['2011-02-01', 8107.0], ['2011-04-01', 8400.0], ['2011-05-15',
... 9000.0], ['2010-05-01', 6960.0], ['2005-12-15', 6000.0], ['2010-10-01',
... 6263.0], ['2011-06-02', 3000.0], ['2010-11-01', 4170.0], ['2009-09-25',
... 6000.0]]
>>> monthtotal = defaultdict(float)
>>> months = ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL',
...  'AUG', 'SEP', 'OCT', 'NOV', 'DEC']
>>> for s in data:
...  monthtotal[months[int(s[0].split('-')[1]) - 1]] += s[1]
... 
>>> monthtotal
defaultdict(<type 'float'>, {'MAR': 6000.0, 'FEB': 14107.0, 'AUG': 10170.0, 'SEP': 6000.0, 'APR': 14400.0, 'JUN': 3000.0, 'JUL': 6000.0, 'JAN': 3000.0, 'MAY': 15960.0, 'NOV': 4170.0, 'DEC': 18000.0, 'OCT': 6263.0})


Another one solution without collections:

from datetime import datetime

getdate = lambda strd: (datetime.strptime(strd, '%Y-%m-%d').strftime('%Y-%b').split('-'))

data = [['2009-01-20', 3000.0], ['2011-03-01', 6000.0], ['2008-12-15',
6000.0], ['2002-02-15', 6000.0], ['2009-04-20', 6000.0], ['2010-08-01',
4170.0], ['2002-07-15', 6000.0], ['2008-08-15', 6000.0], ['2010-12-01',
6000.0], ['2011-02-01', 8107.0], ['2011-04-01', 8400.0], ['2011-05-15',
9000.0], ['2010-05-01', 6960.0], ['2005-12-15', 6000.0], ['2010-10-01',
6263.0], ['2011-06-02', 3000.0], ['2010-11-01', 4170.0], ['2009-09-25',
6000.0]]

yeartotal = {}
monthtotal = {}

for dateVal, total in map(lambda sdata: (getdate(sdata[0]), sdata[1]), data):
    if dateVal[0] not in yeartotal:
        yeartotal[dateVal[0]] = 0
    if dateVal[1] not in monthtotal:
        monthtotal[dateVal[1]] = 0
    yeartotal[dateVal[0]] += total
    monthtotal[dateVal[1]] += total


Here's another solution, using numpy.

First, we need to reshape the data to make it look a bit like a matrix. we'll use a default dict with years as keys and lists of floats as values.

>>> pre_matrix = collections.defaultdict(lambda:[0]*12)
>>> for row in tdata:
...     pre_matrix[row[0].year][row[0].month - 1] += row[1]
...     

Since we don't want an a array containing every year since Common Era, lets examine the pre-formatted data and extract the minimum and maximum years.

>>> r = range(min(pre_matrix.keys()),1+max(pre_matrix.keys()))

Finally, build the matrix, with each row containing a single year's data.

>>> matrix = numpy.array([pre_matrix[y] for y in r])

From there, it's a simple matter to get the row and column sums. we'll use zip() to put the interesting date values back in.

>>> zip((datetime.datetime(1970, i+1, 1).strftime("%b"), s) for i, s in enumerate(matrix.sum(0)))
[(('Jan', 3000.0),),
 (('Feb', 14107.0),),
 (('Mar', 6000.0),),
 (('Apr', 14400.0),),
 (('May', 15960.0),),
 (('Jun', 3000.0),),
 (('Jul', 6000.0),),
 (('Aug', 10170.0),),
 (('Sep', 6000.0),),
 (('Oct', 6263.0),),
 (('Nov', 4170.0),),
 (('Dec', 18000.0),)]

Since we don't need to localize the years, it's a bit simpler.

>>> list(zip(r, matrix.sum(1)))
[(2002, 12000.0),
 (2003, 0.0),
 (2004, 0.0),
 (2005, 6000.0),
 (2006, 0.0),
 (2007, 0.0),
 (2008, 12000.0),
 (2009, 15000.0),
 (2010, 27563.0),
 (2011, 34507.0)]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜