开发者

How to a split a list in parts that have size less than 1MByte

I have a sorted list of dictionaries returned by a remote API call (tipically the response is less than 4 MByte.

开发者_如何学GoI would like to split this list in chunks where the MAX allowed size of the resulted single chunk is 1 MByte.*

The resulted list of chunks need to preserve the initial sorting; these chunks then will be serialized (via Pickle) and put into different Blob field having 1 MByte MAX size.

What's the fastest code to achieve that with Python 2.5?

*the number of chunks should be the lowest that fits into the 1MByte constraint


Following up on my comment. You could use this extension. And the following script. Assume that this won't optimize the size of the chunks. It only assures that none of them are larger than MAX

from sizeof import asizeof

matrix=[]
new_chunk = []
size_of_current_chunk = 0
for x in your_sorted_list:
    s = asize(x)
    if size_of_current_chunk + s > MAX:
        matrix.append(new_chunk)
        size_of_current_chunk = 0
        new_chunk = []
    size_of_chunk += s
    new_chunk.append(x)

if len(new_chunk):
    matrix.append(new_chunk)

the element matrix would contain lists of objects with less than MAX bytes in each of them.

It'd be interesting to measure the performance of asize against just encoding the objects in a json string and multiplying the json string by sizeof(char).


I found pympler library, the asizeof module provides basic size information for one or several Python objects tested with Python 2.2.3, 2.3.7, 2.4.5, 2.5.1, 2.5.2, 2.6.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜