开发者

Python XML Parsing Confusion

I'm using xml.dom.mindom in Python and have retrieved the book node in the below XML tree. I want to get a list of all children nodes. In this case, I would think there would only be one.

<Book>
    <Title>Why is this so hard</Title>
</Book

When I call:

nodeList = bookNode.childNodes
print "nodeList has " + str(nodeList.length) + " elements"
for node in nodeList:
    print "Found a " + node.nodeName + " node"

I get the following output:

nodeList has 3 elements
Found a #text node
Found a Book node
Found a #text node

What are these random #text nodes? How do I get the tagName and value for each of the legitimate nodes? I want to get a list of key->value pairs for each of the nodes under Book. I don't want to use getElementsByName because I will not know all of the tagNames ahead 开发者_开发技巧of time.

Book -> "Why is this so hard"

Thanks- Jonathan


The first text node is the whitespace between <Book> and <Title>. The second is the whitespace between </Title> and </Book>


What are these random #text nodes?

Hardly random, they're text nodes representing the whitespace you put between tags. XML has to remember this, or the document would be all run together in one unreadable line when it's reserialised.

How do I get the tagName and value for each of the legitimate nodes?

Loop over the child nodes, ignoring those that aren't elements.

I want to get a list of key->value pairs for each of the nodes under Book.

book= {}
for child in bookNode.childNodes:
    if child.nodeType==child.ELEMENT_NODE:
        book[child.tagName]= '' if child.firstChild is None else child.firstChild.data

This assumes that every element contains only a single text node.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜