开发者

Parsing XML using xml.etree.cElementTree

I have the following XML in a string named 'xml':

<?xml version="1.0" encoding="ISO-8859-1"?>
<Book>
  <Page>
    <Text>Blah</Text>
  </Page>
</Book>

I'm trying to get the value Blah out of it but I'm having trouble with xml.etree.cElementTree. I've tried th开发者_运维问答e find() and findtext() methods but nothing. Eventually I did this:

import xml.etree.cElementTree as ET
...
root = ET.fromstring(xml)
element = root.getchildren()[0].getchildren()[0]

Element now equals the element, which is what I want (for this solution anyway), but how do I get the inner text from it? element.text does not work. Any ideas?

EDIT: element.text gives me None

PS: I am using Python 2.5 atm.

As an extra question: what is a better way to parse xml strings in python?


Please explain what "does not work" means to you. What I guess is the code that you ran (or should have ran) worked for me (Python 2.x for x in (5, 6)) -- see below. It even worked on Python 2.1 with the appropriate change to the import statement. Note that I displayed element.tag to show that it is referring to the desired element.

>>> xml = """\
... <?xml version="1.0" encoding="ISO-8859-1"?>
... <Book>
...   <Page>
...     <Text>Blah</Text>
...   </Page>
... </Book>
... """
>>> import xml.etree.cElementTree as ET
>>> root = ET.fromstring(xml)
>>> element = root.getchildren()[0].getchildren()[0]
>>> element.tag
'Text'
>>> element.text
'Blah'
>>>

Perhaps you'd like to take a rain-check on your extra question till we get the first one sorted out ;-)


For non-massive .xmls (a few mb maybe) the way you are doing it should be fine, but if you know the tag and just want the value as an output, I found a way to do it thanks mostly to http://enginerds.craftsy.com/blog/2014/04/parsing-large-xml-files-in-python-without-a-billion-gigs-of-ram.html but modified it for my needs and doesn't even need xml.etree at all. For example:

path = 'yourxmlfilepath.xml'
tagyouwant='Headline' #just an example, i wanted the text between 'Headline' tags
opentag='<'+tagyouwant+'>'
closetag='</'+tagyouwant+'>'

with open(path,'rb') as inputfile:
    for line in inputfile:
        if opentag in line:
            strtoget=str(line)
            strtoget=strtoget.replace(opentag,"") #trimming the tags from the text
            strtoget=strtoget.replace(closetag,"")
            print strtoget

instead of the final print statement you can do what you want with the string you now have. Alternatively, you can also run this as a batch or command line and output to a .txt and store all the values as you go that way (really depends what you want to do with it).

Anyway i thought this was a clever, memory efficient way to parse huge xml files when you know exactly what you want to get out of it already.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜