Iterate over large collection in django - cache problem
I need to iterate over large collection (3 * 10^6 elements) in Django to do some kind of analysis that can't be done using single SQL statement.
- Is it possible to turn off collection caching in django? (Caching all the data is not to be acceptable data has around 0.5GB)
- Is it possible to make django fetch collection in chunks? It seems that it tries to pre fetch whole collection in to the memory and then iterate over it. I think that observing the speed of execution:
iter(Coll.objects.all()).next()
- this takes foreveriter(Coll.objects.all()[:100开发者_开发知识库00]).next()
- this takes less than a second
Use QuerySet.iterator()
to walk over the results instead of loading them all first.
It seams that the problem was caused by the database backend (sqlite) that doesn't support reading in chunks. I've used sqlite as the database will be trashed after I do all the computations but it seems that sqlite isn't good even for that.
Here is what I've found in django source code of sqlite backend:
class DatabaseFeatures(BaseDatabaseFeatures):
# SQLite cannot handle us only partially reading from a cursor's result set
# and then writing the same rows to the database in another cursor. This
# setting ensures we always read result sets fully into memory all in one
# go.
can_use_chunked_reads = False
精彩评论