how to calculate number of items in per user groupby item

2023-02-25 04:26 问答作者：

How can I output a result like this:

user    I   R   H
=================
atl001  2   1   0
cms017  1   2   1
lhc003  0   1   2

from a list like this:

atl001 I
atl001 I
cms017 H
atl001 R
lhc003 开发者_高级运维H
cms017 R
cms017 I
lhc003 H
lhc003 R
cms017 R

i.e. I want to calculate the number of I, H and R per user. Just a note that I can't use groupby from itertools in this particular case. Thanks in advance for your help. Cheers!!

data='''atl001 I
atl001 I
cms017 H
atl001 R
lhc003 H
cms017 R
cms017 I
lhc003 H
lhc003 R
cms017 R'''

stats={}
for i in data.split('\n'):
    user, irh = i.split()
    u = stats.setdefault(user, {})
    u[irh] = u.setdefault(irh, 0) + 1

print 'user  I  R  H'
for user in sorted(stats):
    stat = stats[user]
    print user, stat.get('I', 0), stat.get('R', 0), stat.get('H', 0)

data = 112*'cms017 R\n'

data = data + '''atl001 I
cms017 R
atl001 I
cms017 H
atl001 R
lhcabc003 H
cms017 R
lhcabc003 H
lhcabc003 R
cms017 R
cms017 R
cms017 R'''
print data,'\n'

stats = {}
d = {'I':0,'R':1,'H':2}
L = 0
for line in data.splitlines():
    user,irh = line.split()
    stats.setdefault(user,[0,0,0])
    stats[user][d[irh]] += 1
    L = max(L, len(user))

LL = len(str(max(max(stats[user])
                 for user in stats )))

cale = ' %%%ds %%%ds %%%ds' % (LL,LL,LL)
ch = 'user'.ljust(L) + cale % ('I','R','H')

print '%s\n%s' % (ch, len(ch)*'=')
print '\n'.join(user.ljust(L) + cale % tuple(stats[user])
                for user in sorted(stats.keys()))

result

user        I   R   H
=====================
atl001      2   1   0
cms017      0 117   1
lhcabc003   0   1   2

Also:

data = 14*'cms017 R\n'

data = data + '''atl001 I
cms017 R
atl001 I
cms017 H
atl001 R
lhcabc003 H
cms017 R
lhcabc003 H
lhcabc003 R
cms017 R
cms017 R
cms017 R'''
print data,'\n'

Y = {}
L = 0
for line in data.splitlines():
    user,irh = line.split()
    L = max(L, len(user))
    if (user,irh) not in Y:
        Y.update({(user,'I'):0,(user,'R'):0,(user,'H'):0})
    Y[(user,irh)] += 1

LL = len(str(max(x for x in Y.itervalues())))

cale = '%%-%ds %%%ds %%%ds %%%ds' % (L,LL,LL,LL)
ch = cale % ('user','I','R','H')

print '%s\n%s' % (ch, len(ch)*'=')
li = sorted(Y.keys())
print '\n'.join(cale % (a[0],Y[b],Y[c],Y[a])
                for a,b,c in (li[x:x+3] for x in xrange(0,len(li),3)))

result

user       I  R  H
==================
atl001     2  1  0
cms017     0 19  1
lhcabc003  0  1  2

PS:

The names of users are all justified in a number L of characters

In my code the columns, to avoid complexity as in the Sebastian's code, I, R , H are justified in the same number LL of characters, which is the max of all the results present in this columns

Well, using groupby for this problem makes no sense anyway. For starters, your data isn't sorted (groupby doesn't sort the groups for you), and the lines are very simple.

Just keep count as you process each line. I am assuming you don't know what flags you'll get:

from sets import Set as set # python2.3 compatibility
counts = {} # counts stored in user -> dict(flag=counter) nested dicts
flags = set()
for line in inputfile:
    user, flag = line.strip().split()
    usercounts = counts.setdefault(user, {})
    usercounts[flag] = usercounts.setdefault(flag, 0) + 1
    flags.add(flag)

Printing the info after that is a question of iterating over your counts structure. I am assuming usernames are always 6 characters long:

flags = list(flags)
flags.sort()
users = counts.keys()
users.sort()
print "user  %s" % ('  '.join(flags))
print "=" * (6 + 3 * len(flags))
for user in users:
    line = [user]
    for flag in flags:
        line.append(counts[user].get(flag, 0))
    print '  '.join(line)

All code above is untested, but should roughly work.

Here's a variant that uses nested dicts to count job statuses and computes max field widths before printing:

#!/usr/bin/env python
import fileinput
from sets import Set as set # python2.3

# parse job statuses
counter = {}
for line in fileinput.input():
    user, jobstatus = line.split()
    d = counter.setdefault(user, {})
    d[jobstatus] = d.setdefault(jobstatus, 0) + 1

# print job statuses
# . find field widths
status_names = set([name for st in counter.itervalues() for name in st])
maxstatuslens = [max([len(str(i)) for st in counter.itervalues()
                      for n, i in st.iteritems()
                      if name == n])
                 for name in status_names]
maxuserlen = max(map(len, counter))
row_format = (("%%-%ds " % maxuserlen) +
              " ".join(["%%%ds" % n for n in maxstatuslens]))
# . print header
header = row_format % (("user",) + tuple(status_names))
print header
print '='*len(header)
# . print rows
for user, statuses in counter.iteritems():
    print row_format % (
        (user,) + tuple([statuses.get(name, 0) for name in status_names]))

Example

$ python print-statuses.py <input.txt
user   I H R
============
lhc003 0 2 1
cms017 1 1 2
atl001 2 0 1

Here's a variant that uses flat dictionary with a tuple (user, status_name) as a key:

#!/usr/bin/env python
import fileinput
from sets import Set as set # python 2.3

# parse job statuses
counter = {}
maxstatuslens = {}
maxuserlen = 0
for line in fileinput.input():
    key = user, status_name = tuple(line.split())
    i = counter[key] = counter.setdefault(key, 0) + 1
    maxstatuslens[status_name] = max(maxstatuslens.setdefault(status_name, 0),
                                     len(str(i)))
    maxuserlen = max(maxuserlen, len(user))

# print job statuses
row_format = (("%%-%ds " % maxuserlen) +
              " ".join(["%%%ds" % n for n in maxstatuslens.itervalues()]))
# . print header
header = row_format % (("user",) + tuple(maxstatuslens))
print header
print '='*len(header)
# . print rows
for user in set([k[0] for k in counter]):
    print row_format % ((user,) +
        tuple([counter.get((user, status), 0) for status in maxstatuslens]))

The usage and output are the same.

As a hint:

Use a nested dictionary structure for counting the occurences:

user -> character -> occurences of the character for user

Writing the parser code and incrementing the counters and printing the result is up to you ...a good exercise.

继续阅读：python python-2.3

how to calculate number of items in per user groupby item

Example

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Example

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？