JDO on GoogleAppEngine: How to efficiently retrieve a subset of fields from a huge number of records
I'm facing a little problem of scalabi开发者_高级运维lity. I'm using JDO to query my datastore. I need to retrieve all the keys of a given entity (such keys are of type Long). Given that in my datastore such entity has 1.000.000 of records, I need to get them in a very efficient way, in order to loop over this set in a background task.
Which is the most efficient way to do this?
And what if I need not only the key, but also another field? Let's say I've got an entity called TPImage:
Long idPic; //this is my key
String title; //this is the field I want to retrieve together with the key
... // other properties
How may I retrieve both idPic and title in a single efficient query?
Something like
Query q = new Query("select idPic, title from " + TPImage.class.getName());
but more efficient?
Thank you very much!
Bye cghersi
The scaling problem you have is that you need all the keys - not that you can't fetch them efficiently enough. No matter what system you use, this is always going to be at least O(n).
Rather than trying to prefetch everything, you should do your work in batches, and use cursors to retrieve the next set of results efficiently.
If you need a field from the model, you must retrieve the whole model instance - they're stored as serialized blobs, so there's no way to retrieve just one field.
Your question has 2 parts. For the first part, getting keys only, you can specify that query should only return keys when you create it by setting the parameter keys_only to True. see here: http://code.google.com/appengine/docs/python/datastore/queryclass.html#Query
This will help somewhat, as you are not retrieving the entire entity. However, it will probably not help you enough if you want to process 1,000,000 all at once. In that case, take Nick's advice and break up the work.
精彩评论