开发者

Python: dynamic list parsing and processing

I have popened a process which is producing a list of dictionaries, something like:

[{'foo': '1'},{'bar':2},...]
开发者_Go百科

The list takes a long time to create and could be many gigabytes, so I don't want to reconstitute it in memory and then iterate over it.

How can I parse the partially completed list such that I can process each dictionary as it is received?


The Python tokenizer is available as part of the Python standard library, module tokenize. It relies for its input on receiving at the start a readline function (which must supply to it a "line" of input), so it can operate incrementally -- if there are no newlines in your input, you can simulate that as long as you can identify spots where adding a newline is innocuous (not breaking up a token -- thanks to the starting [ everything will be one "logical" line anyway). The only tokens that will require care to avoid being broken will be quoted strings. I'm not pursuing this in depth at this time since if you actually have newlines in your input you won't need to worry.

From the stream of tokens you can reconstruct the string representing each dict in the list (from an opening brace token, to the balancing closed bracket), and use ast.literal_eval to get the corresponding Python dict.

So, do you have newlines in your input? if so, then the whole task should be very easy.


Pickle each dictionary separately. Shelve can help you do this.

Writer

import shelve
db= shelve.open(filename)
count= 0
for ...whatever...
    # build the object
    db[count]= object
    count += 1
db['size']= count
db.close

Reader

import shelve
db= shelve.open(filename)
size= db['size']
for i in xrange(size):
    object= db[i]
    # process the object
db.close()
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜