Parsing XML using xml.etree.cElementTree
I have the following XML in a string named 'xml':
<?xml version="1.0" encoding="ISO-8859-1"?>
<Book>
<Page>
<Text>Blah</Text>
</Page>
</Book>
I'm trying to get the value Blah out of it but I'm having trouble with xml.etree.cElementTree. I've tried th开发者_运维问答e find() and findtext() methods but nothing. Eventually I did this:
import xml.etree.cElementTree as ET
...
root = ET.fromstring(xml)
element = root.getchildren()[0].getchildren()[0]
Element now equals the element, which is what I want (for this solution anyway), but how do I get the inner text from it? element.text does not work. Any ideas?
EDIT: element.text gives me None
PS: I am using Python 2.5 atm.
As an extra question: what is a better way to parse xml strings in python?
Please explain what "does not work" means to you. What I guess is the code that you ran (or should have ran) worked for me (Python 2.x for x in (5, 6)) -- see below. It even worked on Python 2.1 with the appropriate change to the import statement. Note that I displayed element.tag
to show that it is referring to the desired element.
>>> xml = """\
... <?xml version="1.0" encoding="ISO-8859-1"?>
... <Book>
... <Page>
... <Text>Blah</Text>
... </Page>
... </Book>
... """
>>> import xml.etree.cElementTree as ET
>>> root = ET.fromstring(xml)
>>> element = root.getchildren()[0].getchildren()[0]
>>> element.tag
'Text'
>>> element.text
'Blah'
>>>
Perhaps you'd like to take a rain-check on your extra question till we get the first one sorted out ;-)
For non-massive .xmls (a few mb maybe) the way you are doing it should be fine, but if you know the tag and just want the value as an output, I found a way to do it thanks mostly to http://enginerds.craftsy.com/blog/2014/04/parsing-large-xml-files-in-python-without-a-billion-gigs-of-ram.html but modified it for my needs and doesn't even need xml.etree at all. For example:
path = 'yourxmlfilepath.xml'
tagyouwant='Headline' #just an example, i wanted the text between 'Headline' tags
opentag='<'+tagyouwant+'>'
closetag='</'+tagyouwant+'>'
with open(path,'rb') as inputfile:
for line in inputfile:
if opentag in line:
strtoget=str(line)
strtoget=strtoget.replace(opentag,"") #trimming the tags from the text
strtoget=strtoget.replace(closetag,"")
print strtoget
instead of the final print statement you can do what you want with the string you now have. Alternatively, you can also run this as a batch or command line and output to a .txt and store all the values as you go that way (really depends what you want to do with it).
Anyway i thought this was a clever, memory efficient way to parse huge xml files when you know exactly what you want to get out of it already.
精彩评论