开发者

How to optimize use of querysets with lists

I have a model that has a couple million objects. Each object represents a call made/received by a company.

To simplify things, let's say this model, Call, has these fields:

calldate, context, channel.

My goal is to know the average # of calls made and received during each hour of the day of the month (load by hour). The catch is: I need to find this for port1 and port2 separately.

As of now, my code works fine, except that it takes around 1 whole minute to give me the result for a range of 4 months and I it seems extremely inefficient.

I've done some simple profiling and discovered that the extend is taking around 99% of the processing time:

queryset = Call.objects.filter(calldate__gte='SOME_DATE')
port1, port2 = [],[]
port1.extend(queryset.filter(context__icontains="e1-1"))
port2.extend(queryset.filter(context__icontains="e1-2"))
channels_in_port1 = ["Port/%d-2" % x for x in range(1,32)]
channels_in_port2 = ["Port/%d-2" % x for x in range(32,63)]

for i in channels_in_port1:
    port1.extend(queryset.filter(channel__icontains=i))
for i in channels_in_port2:
    port2.extend(queryset.filter(channel__icontains=i))

port1 and port2 have around 150k objects combined now.

As soon as I have all calls for port1 and port2, I'm good to go. The rest of the code is basically some for loops for port1 and port2 that sums up and takes the average of calls according to the hour/day/month. Trivial stuff.

I tried to avoid using a开发者_运维问答ny "extend" by using itertools.chain and chaining the querysets instead. However, that made the processing time shift to the part where I do the trivial for loops to calculate the load by hour.

Any alternatives? Better ways to filter the queryset?

Thanks very much!!


Have you considered using django's aggregate functions? http://docs.djangoproject.com/en/dev/topics/db/aggregation/


I presume your problem is with the second set of extends, ie those within the for loops, rather than the first. (The first is completely unnecessary, in any case: rather than defining an empty list up front and extending it, you can just do port1 = list(queryset.filter(context__icontains="e1-1")).)

Anyway, to summarize what I think you are trying to do: you want to get all Call objects for a certain date, in two blocks depending on the value for channel: one where it contains values from 0 to 31, and one with values between 32 and 62.

It seems like you could do this with just two queries, without any extending at all:

port1 = queryset.filter(channel__range=["Port/1-2", "Port/31-2"])
port2 = queryset.filter(channel__range=["Port/1-32", "Port/31-62"])

Does that not do what you want?

Edit in response to comment but that's then just two queries which you can extend, or concatenate. The problem with your code as posted is that you are doing 31 queries and extend operations for each port, which is bound to be expensive. If you just do one each, plus one extend/concat, that will be much cheaper.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜