Analyse data using time and date objects
I have a rather unique problem I'm trying to solve:
Based on this sample data (actual data is very many records, and at least 4 per card per day):
serial, card, rec_date, rec_time, retrieved_on
2976 00040 2010-07-29 18:57 2010-07-31 13:37:31
2977 00040 2010-07-30 09:58 2010-07-31 13:37:31
2978 00040 2010-07-30 15:33 2010-07-31 13:37:31
2979 00040 2010-07-30 16:13 2010-07-31 13:37:31
2980 00040 2010-07-30 19:41 2010-07-31 13:37:31
The records are from a time-clock system.
What I want to do is take a certain group of entries, filtered by card
and rec_date
, then determine how long the person has been working during the day and the length of each workspan, how many breaks he/she has taken, and at the end of the week get the total number of hours worked.
From the above list, 2977
is a check in, then 2978
is a check out and so on.
I'm lost at how to do this though, so I thought someone here would开发者_运维技巧 have an idea.
I'm using a simple class to store this data after importing else from elsewhere:
class TimeClock(models.Model):
serial = models.CharField(max_length = 16)
card_no = models.CharField(max_length = 10)
rec_date = models.DateField()
rec_time = models.TimeField()
oper_date = models.DateTimeField(default=datetime.today)
Clearly, class TimeClock -- by itself -- is inadequate for what you're doing.
You need to summarize TimeClock to create WorkIntervals, which you can work with. These are pairs of TimeClock rows that show the (theoretical) start and end of a work span.
If someone fails to clock in, you're completely unable to reason out what's going on. It's not "hard", it's impossible.
Also, if someone works past midnight, you're unable to reason out what's going on. It's not "hard", it's impossible.
But, we'll pretend no one works past midnight and no one fails to clock in or out (hahaha)
def make_pairs( tc_query_set ):
start = None
for row in tc_query_set:
if start is None:
start= row
continue
elif start.card == row.card and start.rec_date == row.rec_date:
yield start, row
start= None
else:
# May as well raise an exception -- the data cannot the processed
yield start, None
start= row
You use this as follows.
data = TimeClock.objects.order_by('card','rec_date','rec_time').all()
for start, end in make_pairs( data ):
WorkIntervals.objects.create( start.card, start.rec_date, start.rec_time, end.rec_time, ... )
Now you can work with the intervals. If it was possible to create them.
Well, there are a bunch of separate problems here. I assume that you've already got filtered data, so that your log looks like all the events for a unique card on a particular day. Suppose this data is stored as a list of strings in log
. Then:
import datetime
def dates( log ):
''' Yields consecutive datetimes in the log. '''
for event in log:
yield datetime.datetime.strptime( event[ 12 : 28 ], "%Y-%m-%d %H:%M" )
def time_clocked_in( log ):
assert not len( log ) % 2
total_time = datetime.timedelta( 0 )
event_dates = dates( log )
try:
while 1:
total_time -= next( event_dates ) - next( event_dates )
except StopIteration:
pass
return total_time
log = [
"2977 00040 2010-07-30 09:58 2010-07-31 13:37:31",
"2978 00040 2010-07-30 15:33 2010-07-31 13:37:31",
"2979 00040 2010-07-30 16:13 2010-07-31 13:37:31",
"2980 00040 2010-07-30 19:41 2010-07-31 13:37:31"
]
print( time_clocked_in( log ) )
>>> 9:03:00
精彩评论