XML document being parsed as single element instead of sequence of nodes
Given xml开发者_StackOverflow that looks like this:
<Store>
<foo>
<book>
<isbn>123456</isbn>
</book>
<title>XYZ</title>
<checkout>no</checkout>
</foo>
<bar>
<book>
<isbn>7890</isbn>
</book>
<title>XYZ2</title>
<checkout>yes</checkout>
</bar>
</Store>
I am getting this as my parsed xmldoc:
>>> from xml.dom import minidom
>>> xmldoc = minidom.parse('bar.xml')
>>> xmldoc.toxml()
u'<?xml version="1.0" ?><Store>\n<foo>\n<book>\n<isbn>123456</isbn>\n</book>\n<t
itle>XYZ</title>\n<checkout>no</checkout>\n</foo>\n<bar>\n<book>\n<isbn>7890</is
bn>\n</book>\n<title>XYZ2</title>\n<checkout>yes</checkout>\n</bar>\n</Store>'
Is there an easy way to pre-process this document so that when it is parsed, it isn't parsed as a single xml element?
An XML document always has a single root element. If you don't care about the root element, just ignore it and look at its children instead!
For example, using the more modern element-tree (but minidom offers similar possibilities in this respect):
try:
import xml.etree.cElementTree as et
except ImportError:
import xml.etree.ElementTree as et
xmlin = '''<Store>
<foo>
<book>
<isbn>123456</isbn>
</book>
<title>XYZ</title>
<checkout>no</checkout>
</foo>
<bar>
<book>
<isbn>7890</isbn>
</book>
<title>XYZ2</title>
<checkout>yes</checkout>
</bar>
</Store>'''
root = et.fromstring(xmlin)
for child in root.getchildren():
print et.tostring(child)
xmldoc
is a parsed XML object. toxml()
asks it to convert itself back to a string of XML text again. Explore a little further:
>>> xmldoc.childNodes
[<DOM Element: Store at 0x212b788>]
>>> xmldoc.childNodes[0].childNodes
[<DOM Text node "u'\n'">, <DOM Element: foo at 0x212bcd8>, <DOM Text node "u'\n'">, <DOM Element: bar at 0x212b2d8>, <DOM Text node "u'\n'">]
Then, realize that DOM is difficult to work with and read about ElementTree.
精彩评论