Python: dynamic list parsing and processing

2023-01-07 13:14 问答作者：

I have popened a process which is producing a list of dictionaries, something like:

[{'foo': '1'},{'bar':2},...]
开发者_Go百科

The list takes a long time to create and could be many gigabytes, so I don't want to reconstitute it in memory and then iterate over it.

How can I parse the partially completed list such that I can process each dictionary as it is received?

The Python tokenizer is available as part of the Python standard library, module tokenize. It relies for its input on receiving at the start a readline function (which must supply to it a "line" of input), so it can operate incrementally -- if there are no newlines in your input, you can simulate that as long as you can identify spots where adding a newline is innocuous (not breaking up a token -- thanks to the starting [ everything will be one "logical" line anyway). The only tokens that will require care to avoid being broken will be quoted strings. I'm not pursuing this in depth at this time since if you actually have newlines in your input you won't need to worry.

From the stream of tokens you can reconstruct the string representing each dict in the list (from an opening brace token, to the balancing closed bracket), and use ast.literal_eval to get the corresponding Python dict.

So, do you have newlines in your input? if so, then the whole task should be very easy.

Pickle each dictionary separately. Shelve can help you do this.

Writer

import shelve
db= shelve.open(filename)
count= 0
for ...whatever...
    # build the object
    db[count]= object
    count += 1
db['size']= count
db.close

Reader

import shelve
db= shelve.open(filename)
size= db['size']
for i in xrange(size):
    object= db[i]
    # process the object
db.close()

继续阅读：python

Python: dynamic list parsing and processing

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？