Parsing a log file to find most active times
I have a logfile that looks like this:
Sun Mar 13 23:45:01 EDT 2011 - 2 game(s) running
It updates every 15 minutes.
I want to write code (was going to use Python) that'll parse this and tell me what time is the most active.
I understand this is a bit vague, but I wanted to hear different approac开发者_运维技巧hes I could take.
For parsing times, you probably want time.strptime
(http://docs.python.org/library/time.html#time.strptime). For breaking down each line from the logfile, you could use a regular expression or just something like splitting on " - "
and then parsing the number of games ad hoc.
For finding most-active times, how clever you need to be depends on what sort of answer you want. For instance, you could just classify times according to what hour of the day they're in:
for line in open(logfile, 'r'):
(timestr, gamestr) = line.split(' - ')
hour = time.strptime(timestr, time_format).tm_hour
n_games = parse_game_count(gamestr)
entry_counts[hour] += 1
game_counts[hour] += n_games
for hour in range(25): # you can in theory get hour=24!
busyness_by_hour[hour] = game_counts[hour] / entry_counts[hour]
(warning 1: untested code; warning 2: some details omitted, such as the definition of parse_game_count; warning 3: on some versions of Python that last division will do integer division which isn't what you want.)
You might actually care more about recent entries in the log (in which case, e.g., you could weight more recent entries more highly -- entry_counts[hour] += weight
and game_counts[hour] += weight*n_games
where weight
is bigger for more recent entries). You might want quarter-hour resolution. If your updates aren't exactly every 15 minutes then you might want to do some sort of fancy curve-fitting to estimate activity at finer granularity.
First you may use regular expressions to separate the date and the number for each line:
r'^(.*?) - (\d*?).*$'
Then you can use strptime to convert the first parameter returned from the regex to a date.
Then you have it. You know what to do next :)
精彩评论