开发者

Python XML to dictionary to iterate over items

I have the follo开发者_开发知识库wing XML example

<?xml version="1.0"?>
<test>
    <items>
        <item>item 1</item>
        <item>item 2</item>
    </items>
</test>

I need to iterate over each tag in a for loop in python. If tried many things but I just can't get it..

thanks for the help


I personally use xml.etree.cElementTree, as I've found it works really well, it's fast, easy to use, and works well with big (>2GB) files.

import xml.etree.cElementTree as etree

with open(xml_file_path) as xml_file:
    tree = etree.iterparse(xml_file)
    for items in tree:
        for item in items:
            print item.text

In the interactive console

>>> x="""<?xml version="1.0"?>
<test>
    <items>
        <item>item 1</item>
        <item>item 2</item>
    </items>
</test>"""
>>> x
'<?xml version="1.0"?>\n<test>\n    <items>\n        <item>item 1</item>\n        <item>item 2</item>\n    </items>\n</test>'
>>> import xml.etree.cElementTree as etree
>>> tree = etree.fromstring(x)
>>> tree
<Element 'test' at 0xb63ad248>
>>> for i in tree:
        for j in i:
            print j


<Element 'item' at 0xb63ad2f0>
<Element 'item' at 0xb63ad338>
>>> for i in tree:
        for j in i:
            j.text

'item 1'
'item 2'
>>>


import xml.dom.minidom as md

x='''<?xml version="1.0"?>
<test>
    <items>
        <item>item 1</item>
        <item>item 2</item>
    </items>
</test>
'''

xml=md.parseString(x)

items=xml.getElementsByTagName("item")
# [<DOM Element: item at 0xc16e40>, <DOM Element: item at 0xc16ee0>]

since items is DOM Element Array, you could loop with for


Try xml parser from xml.sax package in standard library.

from xml.sax import parse
from xml.sax.handler import ContentHandler
from sys import argv

class Handler(ContentHandler):
    def startElementNS(self, name, qname, attrs):
        self.startElement(name, attrs)

    def endElementNs(self, name, qname):
        self.endElement(name, attrs)

    def startElement(self, name, qname, attrs):
        ... do whatever you like on tag start...

    def characters(self, content):
        ... on tag content ...

    def endElement(self, name):
        ... on tag closing ...

if __name__ == "__main__":
    parse(argv[1], Handler())

Here I assumed argv[1] is a path to the file you'd like to parse. (first argument to parse() function is filename or stream). It is easy to convert it to for loop: just grab all the information you need in the methods above and push them into some list or stack. Iterate over it once you have finished parsing.


You would probably like to use something like ElementTree This is a well renowned library, I have not personally used it but I always hear good things.

Also as of python 2.5 it's part of the standard library

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜