Python get ID from XML data

2023-03-13 09:58 问答作者：

I am a total python newb and am trying to parse an XML document that is being returned from google as a result of a post request.

The document returned looks like the one outlined in this doc http://code.google.com/apis/documents/docs/3.0/developers_guide_protocol.html#Archives where it says 'The response contains information about the archive.'

The only part I am interested in is the Id attribute right near the beginning. There will only every be 1 entry, and 1 id attribute. How can I extract it to be use later? I've been fighting with this for a while and I feel like I've tried everything from m开发者_开发知识库inidom to elementtree. No matter what I do my search comes back blank, loops don't iterate, or methods are missing. Any assistance is much appreciated. Thank you.

I would highly recommend the Python package BeautifulSoup. It is awesome. Here is a simple example using their example data (assuming you've installed BeautifulSoup already):

from BeautifulSoup import BeautifulSoup

data = """<?xml version='1.0' encoding='utf-8'?>
<entry xmlns='http://www.w3.org/2005/Atom'
xmlns:docs='http://schemas.google.com/docs/2007'
xmlns:gd='http://schemas.google.com/g/2005'>
<id>
https://docs.google.com/feeds/archive/-228SJEnnmwemsiDLLxmGeGygWrvW1tMZHHg6ARCy3Uj3SMH1GHlJ2scb8BcHSDDDUosQAocwBQOAKHOq3-0gmKA</id>
<published>2010-11-18T18:34:06.981Z</published>
<updated>2010-11-18T18:34:07.763Z</updated>
<app:edited xmlns:app='http://www.w3.org/2007/app'>
2010-11-18T18:34:07.763Z</app:edited>
<category scheme='http://schemas.google.com/g/2005#kind'
term='http://schemas.google.com/docs/2007#archive'
label='archive' />
<title>Document Archive - someuser@somedomain.com</title>
<link rel='self' type='application/atom+xml'
href='https://docs.google.com/feeds/default/private/archive/-228SJEnnmwemsiDLLxmGeGygWrvW1tMZHHg6ARCy3Uj3SMH1GHlJ2scb8BcHSDDDUosQAocwBQOAKHOq3-0gmKA' />
<link rel='edit' type='application/atom+xml'
href='https://docs.google.com/feeds/default/private/archive/-228SJEnnmwemsiDLLxmGeGygWrvW1tMZHHg6ARCy3Uj3SMH1GHlJ2scb8BcHSDDDUosQAocwBQOAKHOq3-0gmKA' />
<author>
    <name>someuser</name>
    <email>someuser@somedomain.com</email>
</author>
<docs:archiveNotify>someuser@somedomain.com</docs:archiveNotify>
<docs:archiveStatus>flattening</docs:archiveStatus>
<docs:archiveResourceId>
0Adj-hQNOVsTFSNDEkdk2221OTJfMWpxOGI5OWZu</docs:archiveResourceId>
<docs:archiveResourceId>
0Adj-hQNOVsTFZGZodGs2O72NFMllMQDN3a2Rq</docs:archiveResourceId>
<docs:archiveConversion source='application/vnd.google-apps.document'
target='text/plain' />
</entry>"""

soup = BeautifulSoup(data, fromEncoding='utf8')
print soup('id')[0].text

There is also expat, which is built into Python, but it is worth learning BeautifulSoup, because it will respond way better to real-world XML (and HTML).

Assuming the variable response contains a string representation of the returned HTML document, let me tell you the WRONG way to solve your problem

id = response.split("</id>")[0].split("<id>")[1]

The right way to do it is with xml.sax or xml.dom or expat, but personally, I wouldn't be bothered unless I wanted to have robust error handling of exception cases when response contains something unexpected.

EDIT: I forgot about BeautifulSoup, it is indeed as awesome as Travis describes.

If you'd like to use minidom, you can do the following (replace gd.xml with your xml input):

from xml.dom import minidom

dom = minidom.parse("gd.xml")
id = dom.getElementsByTagName("id")[0].childNodes[0].nodeValue
print id

Also, I assume you meant id element, and not id attribute.

继续阅读：parsing python xml

Python get ID from XML data

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？