Python running out of memory parsing XML using cElementTree.iterparse
A simplified version of my XML parsing function is here:
import xml.etree.cElementTree as ET
def analyze(xml):
it = ET.iterparse(file(xml))
count = 0
for (ev, el) in it:
count += 1
print('count: {0}'.format(count))
This causes Python to run out of memory, which doesn't make a w开发者_开发问答hole lot of sense. The only thing I am actually storing is the count, an integer. Why is it doing this:
See that sudden drop in memory and CPU usage at the end? That's Python crashing spectacularly. At least it gives me a MemoryError
(depending on what else I am doing in the loop, it gives me more random errors, like an IndexError
) and a stack trace instead of a segfault. But why is it crashing?
Code example:
import xml.etree.cElementTree as etree
def getelements(filename_or_file, tag):
context = iter(etree.iterparse(filename_or_file, events=('start', 'end')))
_, root = next(context) # get root element
for event, elem in context:
if event == 'end' and elem.tag == tag:
yield elem
root.clear() # preserve memory
精彩评论