开发者

Python: Count lines and differentiate between them

I'm using an application that gives a timed output based on how man开发者_运维问答y times something is done in a minute, and I wish to manually take the output (copy paste) and have my program, and I wish to count how many times each minute it is done.

An example output is this:

13:48 An event happened.
13:48 Another event happened.
13:49 A new event happened.
13:49 A random event happened.
13:49 An event happened.

So, the program would need to understand that 2 things happened at 13:48, and 3 at 13:49. I'm not sure how the information would be stored, but I need to average them after, to determine an average of how often it happens. Sorry for being so complicated!


You could just use the time as a key for a dictionary and point it to a list of event messages. The length of that value would give you the number of events, while still letting you get at the specific events themselves:

>>> from pprint import pprint
>>> from collections import defaultdict
>>> events = defaultdict(list)
>>> with open('log.txt') as f:
...     for line in f:
...         time, message = line.strip().split(None, 1)
...         events[time].append(message)
... 
>>> pprint(dict(events)) # pprint handles defaultdicts poorly
{'13:48': ['An event happened.', 'Another event happened.'],
 '13:49': ['A new event happened.',
           'A random event happened.',
           'An event happened.']}

If you want to be extra fancy, you could parse the time into a time object.

Edit: Take into account Mike Graham's suggestions.


If you just want a count of how many events happen each minute then you don't really need python, you can do it from bash:

 cut -d ' ' -f1 filename | uniq -c

gives

  2 13:48
  3 13:49


If you don't need to know what happen but only how many times then:

$ python3.1 -c'from collections import Counter
import fileinput
c = Counter(line.split(None, 1)[0] for line in fileinput.input() if line.strip())
print(c)' events.txt 

Output:

Counter({'13:49': 3, '13:48': 2})


You can also use a groupby function from an itertools module with time as a grouping key.

>>> import itertools
>>> from operator import itemgetter
>>> lines = (line.strip().split(None, 1) for line in open('log.txt'))
>>> for key, group in itertools.groupby(lines, key=itemgetter(0)):
...     print '%s - %s' % (key, map(itemgetter(1), group))
... 
13:48 - ['An event happened.', 'Another event happened.']
13:49 - ['A new event happened.', 'A random event happened.', 'An event happened.']


awk '{_[$1]++}END{for(i in _) print i,_[i]}' filename
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜