Parse timestamps in plain text file and count them per 5 minute intervals
My Input is a plain text file containing 6,000 timestamps, looks like this
2011-06-21 13:17:05,905
2011-06-21 13:17:11,371
2011-06-21 13:17:16,380
2011-06-21 13:17:20,074
201开发者_StackOverflow1-06-21 13:17:20,174
2011-06-21 13:17:24,749
2011-06-21 13:17:27,210
2011-06-21 13:17:27,354
2011-06-21 13:17:29,231
2011-06-21 13:17:29,965
2011-06-21 13:17:32,100
2011-06-21 13:17:32,250
2011-06-21 13:17:45,482
2011-06-21 13:17:51,998
2011-06-21 13:18:03,037
2011-06-21 13:18:04,504
2011-06-21 13:18:10,019
2011-06-21 13:18:27,434
2011-06-21 13:18:29,960
2011-06-21 13:18:30,525
...
My output should be a CSV file counting how many lines are found between each 5 minute slot starting at the "whole hour"
Example Output:
From, To, Count
13:00:00, 13:04:59, 0
13:05:00, 13:09:59, 0
13:10:00, 13:14:59, 19
13:15:00, 13:19:59, 24
...
Thanks!
This is untested and you'll have to implement the time conversion functions yourself. You'll have to look in the time module for functions that does what you want. The convert_time_string_to_unix_time should convert a time string to the corresponding number of milliseconds since Jan 1st, 1970 (a standard Unix timestamp).
What it does is basically to divide time into five minute slots, loop through all the timestamps and increase the number of timestamps for that timestamp's slot with 1 for every timestamp found. Then it just iterates over all the found slots and converts them back to timestamps and also prints the number of timestamps found for that slot.
SLOT_LENGTH = 1000 * 60 *5
for line in file:
slot = convert_time_string_to_unix_time(line) / SLOT_LENGTH
bucket[slot] = bucket.get(slot, 0) + 1
for slot in sorted(bucket.keys()):
print(
convert_unix_time_to_time_string(slot * SLOT_LENGTH),
convert_unix_time_to_time_string((slot + 1) * SLOT_LENGTH - 1),
bucket[slot]
)
精彩评论