开发者

How do I speed up iteration of large datasets in Django

I have a query set of approximately 1500 records from a Django ORM query. I have used the select_related() and only() methods to make sure the query is tight. I have also used connection.queries to make sure there is only this one query. That is, I have made sure no extra queries are getting called on each iteration.

When I run the query cut and paste from connection.queries it runs in 0.02 seconds开发者_运维知识库. However, it takes seven seconds to iterate over those records and do nothing with them (pass).

What can I do to speed this up? What causes this slowness?


A QuerySet can get pretty heavy when it's full of model objects. In similar situations, I've used the .values method on the queryset to specify the properties I need as a list of dictionaries, which can be much faster to iterate over.

Django documentation: values_list


1500 records is far from being a large dataset, and seven seconds is really too much. There is probably some problem in your models, you can easily check it by getting (as Brandon says) the values() query, and then create explicitly the 1500 object by iterating the dictionary. Just convert the ValuesQuerySet into a list before the construction to factor out the db connection.


How are you iterating over each item:

items = SomeModel.objects.all()

Regular for loop on each

for item in items:
    print item

Or using the QuerySet iterator

for item in items.iterator():
    print item

According to the doc, the iterator() can improve performance. The same applies while looping very large Python list or dictionaries, it's best to use iteritems().


Does your model's Meta declaration tell it to "order by" a field that is stored off in some other related table? If so, your attempt to iterate might be triggering 1,500 queries as Django runs off and grabs that field for each item, and then sorts them. Showing us your code would help us unravel the problem!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜