开发者

How to write lists "one by one" to a binary file in python?

I have a piece of code which generates quite large lists in each iteration. To save memory I want to write each list to a binary file in each iteration after the list has been generated. I have tried this with text files(even setting the parameter to "wb" in linux). "wb" seems not to have any effect for the file to be written in binary or text format. Moreover, the written file is huge and I don't wan开发者_StackOverflowt this. I am sure that If i can write these lists in binary format this file will be much smaller. thanks


Since you mentioned the need for compressibility, I'd suggest using pickle with the gzip module to compress your output. You can write and read back your lists one at a time, here's an example of how:

import gzip, pickle

output = gzip.open('pickled.gz', 'wb', compresslevel=9)

for x in range(10):
     output.write(pickle.dumps(range(10)) + '\n\n')
output.close()

And then use a generator to yield the lists back one at a time:

def unpickler(input):
    partial = []
    for line in input:
        partial.append(line)
        if line == '\n':
            obj = ''.join(partial)
            partial = []
            yield pickle.loads(obj)

input = gzip.open('pickled.gz', 'rb')
for l in unpickler(input):
    print l

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


You can use cPickle to serialize your lists and dump the result to a file.


The only thing the 'b' flag changes is how linebreak translations are done to support Windows.

import pickle
help(pickle.load)
help(pickle.dump)

# seems fairly efficient, taking 200bytes to store [1,2,...,100],
# 2.7kb to store [1,2,...,1000],
# and 29kb to store [1,2,...,10000]:
>>> len(pickle.dumps(list(range(100))))
208
>>> len(pickle.dumps(list(range(1000))))
2752
>>> len(pickle.dumps(list(range(10000))))
29770

#create and store
data = {}
data['myList'] = [i for i in range(100)]
with open('myfile.pickle', 'wb') as f:
    pickle.dump(data, f)

# retrieve
with open('myfile.pickle', 'wb') as f:
    data2 = pickle.load(f)
print(data2)

Note that it is insecure to use pickle on any user-supplied data. You will want to open the file you are writing to in binary mode.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜