开发者

Querying data from Django

Here's what my model structure looks like:

开发者_开发问答
class Visitor(models.Model):
    id = models.AutoField(primary_key=True)

class Session(models.Model):
    id = models.AutoField(primary_key=True)
    visit = models.ForeignKey(Visitor)
    sequence_no = models.IntegerField(null=False)

class Track(models.Model):
    id = models.AutoField(primary_key=True)
    session = models.ForeignKey(Session)
    action = models.ForeignKey(Action)
    when = models.DateTimeField(null=False, auto_now_add=True)
    sequence_no = models.IntegerField(null = False)

class Action(models.Model):
    id = models.AutoField(primary_key=True)
    url = models.CharField(max_length=65535, null=False)
    host = models.IntegerField(null=False)

As you can see, each Visitor has multiple Sessions; each Session has multiple Tracks and each Track has one Action. Tracks are always ordered ascendingly by the session and the sequence_no. A Visitors average time on an site (i.e. a particular Action.host) is the difference in Track.when (time) between the highest and lowest Track.sequence_no divided by the number of Sessions of that Visitor.

I need to calculate the average time of visitors on the site which be the sum of the time for each visitor on the Action.site divided by the number of visitors.

I could query this using SQL but I'd like to keep my query as Djangonic as possible and I'm still very lost with complex queries.


For a specific Action object you can gather interesting data about Sessions:

from django.db.models import Min, Max
from yourapp.models import *

host = 1  # I suppose you want to calculate for each site

sessions = list(Session.objects.filter(
    track__action__host=host,
).annotate(
    start=Min('track__when'),
    end=Max('track__when'), 
).values('visit_id', 'start', 'end'))

You will get something in the line of:

[ 
  { 'visit_id': 1, 'start': datetime(...), 'end': datetime(...) },
  { 'visit_id': 1, 'start': datetime(...), 'end': datetime(...) },
  { 'visit_id': 2, 'start': datetime(...), 'end': datetime(...) },
 ....
]

Now it's only a matter of getting the desired result from the data:

number_of_visitors = len(set(s['visit_id'] for s in sessions))
total_time = sum((s['end'] - s['start']).total_seconds() for s in sessions)
average_time_spent = total_time / number_of_visitors

Another way is to use two queries instead of one, and avoid the len(set(...)) snippet:

sessions = Session.objects.filter(
    track__action__host=host,
).annotate(
    start=Min('track__when'),
    end=Max('track__when'), 
)

number_of_visitors = sessions.values('visit_id').distict().count()
total_time = sum((s['end'] - s['start']).total_seconds() 
    for s in sessions.values('start', 'end'))

There is NO WAY to do actual calculated fields barring the provided aggregations, so either you do it in raw SQL or you do in code like this.

At least the proposed solution uses Django's ORM as far as possible.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜