Parsing XML in Python using ElementTree example

2022-12-12 02:24 问答作者：

I'm having a hard time finding a good, basic example of how to parse XML in python using Element Tree. From what I can find, this appears to be the easiest library to use for parsing XML. Here is a sample of the XML I'm working with:

<timeSeriesResponse>
    <queryInfo>
        <locationParam>01474500</locationParam>
        <variableParam>99988</variableParam>
        <timeParam>
            <beginDateTime>2009-09-24T15:15:55.271</beginDateTime>
            <endDateTime>2009-11-23T15:15:55.271</endDateTime>
        </timeParam>
     </queryInfo>
     <timeSeries name="NWIS Time Series Instantaneous Values">
         <values count="2876">
            <value dateTime="2009-09-24T15:30:00.000-04:00" qualifiers="P">550</value>
            <value dateTime="2009-09-24T16:00:00.000-04:00" qualifiers="P">419</value>
            <value dateTime="2009-09-24T16:30:00.000-04:00" qualifiers="P">370</value>
            .....
         </values>
     </timeSeries>
</timeSeriesResponse>

I am able to do what I need, using a hard-coded method. But I need my code to be a bit more dynamic. Here is what worked:

tree = ET.parse(sample.xml)
doc = tree.getroot()

timeseries =  doc[1]
values = timeseries[2]

print child.attrib['dateTime'], child.text
#prints 2009-09-24T15:30:00.000-04:00, 550

Here are a couple of things I've tried, none of them worked, reporting that they couldn't find timeSeries (or anything else I tried):

tree = ET.parse(sample.xml)
tree.find('timeSeries')

tree = ET.parse(samp开发者_如何学运维le.xml)
doc = tree.getroot()
doc.find('timeSeries')

Basically, I want to load the xml file, search for the timeSeries tag, and iterate through the value tags, returning the dateTime and the value of the tag itself; everything I'm doing in the above example, but not hard coding the sections of xml I'm interested in. Can anyone point me to some examples, or give me some suggestions on how to work through this?

Thanks for all the help. Using both of the below suggestions worked on the sample file I provided, however, they didn't work on the full file. Here is the error I get from the real file when I use Ed Carrel's method:

 (<type 'exceptions.AttributeError'>, AttributeError("'NoneType' object has no attribute 'attrib'",), <traceback object at 0x011EFB70>)

I figured there was something in the real file it didn't like, so I incremently removed things until it worked. Here are the lines that I changed:

originally: <timeSeriesResponse xsi:schemaLocation="a URL I removed" xmlns="a URL I removed" xmlns:xsi="a URL I removed">
 changed to: <timeSeriesResponse>

 originally:  <sourceInfo xsi:type="SiteInfoType">
 changed to: <sourceInfo>

 originally: <geogLocation xsi:type="LatLonPointType" srs="EPSG:4326">
 changed to: <geogLocation>

Removing the attributes that have 'xsi:...' fixed the problem. Is the 'xsi:...' not valid XML? It will be hard for me to remove these programmatically. Any suggested work arounds?

Here is the full XML file: http://www.sendspace.com/file/lofcpt

When I originally asked this question, I was unaware of namespaces in XML. Now that I know what's going on, I don't have to remove the "xsi" attributes, which are the namespace declarations. I just include them in my xpath searches. See this page for more info on namespaces in lxml.

So I have ElementTree 1.2.6 on my box now, and ran the following code against the XML chunk you posted:

import elementtree.ElementTree as ET

tree = ET.parse("test.xml")
doc = tree.getroot()
thingy = doc.find('timeSeries')

print thingy.attrib

and got the following back:

{'name': 'NWIS Time Series Instantaneous Values'}

It appears to have found the timeSeries element without needing to use numerical indices.

What would be useful now is knowing what you mean when you say "it doesn't work." Since it works for me given the same input, it is unlikely that ElementTree is broken in some obvious way. Update your question with any error messages, backtraces, or anything you can provide to help us help you.

If I understand your question correctly:

for elem in doc.findall('timeSeries/values/value'):
    print elem.get('dateTime'), elem.text

or if you prefer (and if there is only one occurrence of timeSeries/values:

values = doc.find('timeSeries/values')
for value in values:
    print value.get('dateTime'), elem.text

The findall() method returns a list of all matching elements, whereas find() returns only the first matching element. The first example loops over all the found elements, the second loops over the child elements of the values element, in this case leading to the same result.

I don't see where the problem with not finding timeSeries comes from however. Maybe you just forgot the getroot() call? (note that you don't really need it because you can work from the elementtree itself too, if you change the path expression to for example /timeSeriesResponse/timeSeries/values or //timeSeries/values)

继续阅读：elementtree python xml

Parsing XML in Python using ElementTree example

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？