django aggregation to lower resolution using grouping by a date range
horrible title, but let me explain: i've got this django model containing a timestamp (date) and the attribute to log - f.e. the number of users consuming some ressource - (value).
class Viewers(models.Model):
date = models.DateTimeField()
value = models.IntegerField()
for each 10seconds the table contains the number of users. something like this:
| date | value |
|------|-------|
| t1 | 15 |
| t2 | 18 |
| t3 | 27 |
| t4 | 25 |
| .. | .. |
| t30 | 38 |
| t31 | 36 |
| .. | .. |
now i want to generate different statistics from this data, each with another resolution. f.e. for a chart of the last day i don't need the 10second resolution, so i want 5 minute steps (that are build by averaging the values (and maybe also the date) of the rows from t1 to t29, t30 to t59, ...), so that i'll get:
| date | value |
|------|-------|
| t15 | 21 |
| t45 | 32 |
| .. 开发者_Go百科| .. |
the attributes to keep variable are start & end timestamp and the resolution (like 5 minutes). is there a way using the django orm/queryset api and if not, how to reach this with custom sql?
I've been trying to solve this problem in the most 'django' way possible. I've settled for the following. It averages the values for 15minute time slots between start_date and end_date where the column name is'date':
readings = Reading.objects.filter(date__range=(start_date, end_date)) \
.extra(select={'date_slice': "FLOOR (EXTRACT (EPOCH FROM date) / '900' )"}) \
.values('date_slice') \
.annotate(value_avg=Avg('value'))
It returns a dictionary:
{'value_avg': 1116.4925373134329, 'date_slice': 1546512.0}
{'value_avg': 1001.2028985507246, 'date_slice': 1546513.0}
{'value_avg': 1180.6285714285714, 'date_slice': 1546514.0}
The core of the idea comes from this answer to the same question for PHP/SQL. The code passed to extra is for a Postgres DB.
from django.db.models import Avg
Viewers.objects.filter(date__range=(start_time, end_time)).aggregate(average=Avg('value'))
That will get you the average of all the values
between start_time
and end_time
, returned as a dictionary in the form of { 'average': <the average> }
.
start_time
and end_time
need to be Python datetime objects. So if you have a timestamp, or something, you'll need to convert it first. You can also use datetime.timedelta
to calculate the end_time
based on the start_time. For a five minute resolution, something like this:
from datetime import timedelta
end_time = start_time + timedelta(minutes=5)
have you looked at the range filter?
https://docs.djangoproject.com/en/dev/ref/models/querysets/#range
The example given in the doc's seems similar to your situation.
Slightly improving upon the answer by @Richard Corden, in Postgresql you can do
def for_interval(self, start=None, end=None, interval=60):
# (Check start and end values...)
return self
.filter(timestamp__range=(start, end)) \
.annotate(
unix_timestamp=Floor(Extract('timestamp', 'epoch') / interval) * interval,
time=Func(F('unix_timestamp'), function="TO_TIMESTAMP", output_field=models.DateTimeField()),
) \
.values('time') \
.annotate(value=Avg('value')) \
.order_by('time')
I would also recommend storing the floor of the interval rather than its midpoint.
After long trying i made it as SQL-statement:
SELECT FROM_UNIXTIME(AVG(UNIX_TIMESTAMP(date))), SUM(value)
FROM `my_table`
WHERE date BETWEEN SUBTIME(NOW( ), '0:30:00') AND NOW()
GROUP BY UNIX_TIMESTAMP(date) DIV 300
ORDER BY date DESC
with
start_time = SUBTIME(NOW( ), '0:30:00')
end_time = NOW()
period = 300 # in seconds
in the end - not really hard - and indeed independent from the time resolution of the samplings in the origin table.
精彩评论