How to parse xml in Python on Google App Engine
For this following xml, how开发者_如何学Go do I fetch the xml and then parse it to get out the value for <age>
?
<boardgames>
<boardgame objectid="13">
<yearpublished>1995</yearpublished>
<minplayers>3</minplayers>
<maxplayers>4</maxplayers>
<playingtime>90</playingtime>
<age>10</age>
<name sortindex="1">Catan</name>
...
I'm currently trying:
result = urlfetch.fetch(url=game_url)
xml = ElementTree.fromstring(result.content)
But I'm not sure I'm on the right path. When I try to parse I get errors (I think because the xml is not valid xml).
xml.findtext('age')
or xml.findtext('boardgames/age')
would normally get you the 10 inside <age>10</age>
, but the parsing appears to fail due to invalid xml. ElementTree
does a rather poor job of parsing invalid xml in my experience.
Instead use BeautifulSoup, which handles invalid xml well.
content = urllib2.urlopen('http://boardgamegeek.com/xmlapi/boardgame/13').read()
soup = BeautifulSoup(content)
print soup.find('age').string
The following works for me:
import urllib2
from xml.etree import ElementTree
result = urllib2.urlopen('http://boardgamegeek.com/xmlapi/boardgame/13').read()
xml = ElementTree.fromstring(result)
print xml.findtext(".//age")
精彩评论