开发者

Slow Django database operations on Google App Engine

I'm testing Google App Engine and Django-nonrel with free quota. It seems to me, that the database operations to Datastore a hideously slow.

Take for example this simplified function processing a request, which takes in a multipart/form-data of XML blobs, parses them and inserts them to the database:

def post(request):
    fields = cgi.FieldStorage(request)
    with transaction.commit_on_success():
        for xmlblob in fields.getlist('xmlblob'):
            blob_object = parse_xml(xmlblob)
            blob_object.save()

Blob_object has five fields, all of them of type CharField.

For just ca. 30 blobs (with about 1 kB of XML altogethe开发者_StackOverflowr), that function takes 5 seconds to return, and uses over 30000 api_cpu_ms. CPU time should equivalent to the amount of work a 1,2 GHz Intel x86 processor could do in that time, but I am pretty sure it would not take 30 seconds to insert 30 rows to a database for any x86 processor available.

Without saving objects to database (that is, just parsing the XML and throwing away the result) the request takes merely milliseconds.

So should Google App Engine really be so slow, that I can't save even a few dozen entities to the Datastore in a normal request, or am I missing something here? And of course, even if I would do the inserts in some Backend or by using a Task Queue, it would still cost hundreds of times more that what would seem acceptable.

Edit: I found out, that by default, GAE does two index writes per property for each entity. Most of those properties should not be indexed, so the question is: how can I set properties unindexed on Django-nonrel?

I still do feel though, that even with index writes, the database operation is taking ridiculous amount of time.


In the absence of batch operations, there's not much you can do to reduce wallclock times. Batch operations are pretty essential to reducing wallclock time on App Engine (or any distributed platform with RPCs, really).

Under the current billing model, CPU milliseconds reported by the datastore reflect the cost of the operation rather than the actual time it took, and are a way of billing for resources. Under the new billing model, these will be billed explicitly as datastore operations, instead.


I have not found a real answer yet, but I made some calculations for the cost. Currently every indexed property field costs around $0.20 to $0.30 per 10k inserts. With the upcoming billing model (Pricing FAQ) the cost will be exactly $0.1 per 100k operations, or $0.2 per indexed field per 100k inserts with 2 index write operations per insert.

So as the price seems to go down by a factor of ten, the observed slowness is indeed unexpected behaviour. As the free quota is well enough for my test runs, and the with new pricing model coming, I wont let it bother me at this time.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜