开发者

Django: Distinct foreign keys

class Log:
 project = ForeignKey(Project)
 msg = CharField(...)
 date = DateField(...)

I want to select the four most recent Log entries where each Log entry must have a unique project foreign key. I've tries the solutions on google search but none 开发者_如何转开发of them works and the django documentation isn't that very good for lookup..

I tried stuff like:

Log.objects.all().distinct('project')[:4]
Log.objects.values('project').distinct()[:4]
Log.objects.values_list('project').distinct('project')[:4]

But this either return nothing or Log entries of the same project..

Any help would be appreciated!


Queries don't work like that - either in Django's ORM or in the underlying SQL. If you want to get unique IDs, you can only query for the ID. So you'll need to do two queries to get the actual Log entries. Something like:

id_list = Log.objects.order_by('-date').values_list('project_id').distinct()[:4]
entries = Log.objects.filter(id__in=id_list)


Actually, you can get the project_ids in SQL. Assuming that you want the unique project ids for the four projects with the latest log entries, the SQL would look like this:

SELECT project_id, max(log.date) as max_date
FROM logs
GROUP BY project_id
ORDER BY max_date DESC LIMIT 4;

Now, you actually want all of the log information. In PostgreSQL 8.4 and later you can use windowing functions, but that doesn't work on other versions/databases, so I'll do it the more complex way:

SELECT logs.*
FROM logs JOIN (
    SELECT project_id, max(log.date) as max_date
    FROM logs
    GROUP BY project_id
    ORDER BY max_date DESC LIMIT 4 ) as latest
ON logs.project_id = latest.project_id
   AND logs.date = latest.max_date;

Now, if you have access to windowing functions, it's a bit neater (I think anyway), and certainly faster to execute:

SELECT * FROM (
   SELECT logs.field1, logs.field2, logs.field3, logs.date
       rank() over ( partition by project_id 
                     order by "date" DESC ) as dateorder
   FROM logs ) as logsort
WHERE dateorder = 1
ORDER BY logs.date DESC LIMIT 1;

OK, maybe it's not easier to understand, but take my word for it, it runs worlds faster on a large database.

I'm not entirely sure how that translates to object syntax, though, or even if it does. Also, if you wanted to get other project data, you'd need to join against the projects table.


I know this is an old post, but in Django 2.0, I think you could just use:

Log.objects.values('project').distinct().order_by('project')[:4]


You need two querysets. The good thing is it still results in a single trip to the database (though there is a subquery involved).

latest_ids_per_project = Log.objects.values_list(
    'project').annotate(latest=Max('date')).order_by(
    '-latest').values_list('project')

log_objects = Log.objects.filter(
     id__in=latest_ids_per_project[:4]).order_by('-date')

This looks a bit convoluted, but it actually results in a surprisingly compact query:

SELECT "log"."id",
       "log"."project_id",
       "log"."msg"
       "log"."date"
FROM "log"
WHERE "log"."id" IN
    (SELECT U0."id"
     FROM "log" U0
     GROUP BY U0."project_id"
     ORDER BY MAX(U0."date") DESC
     LIMIT 4)
ORDER BY "log"."date" DESC
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜