开发者

: in node causing Keyerror in xmlparsing using ElementTree

Hi I'm using ElementTree to parse out an xml feed from Kuler. I'm only beginning in python but am stuck here. The parsing works fine until I attempt to retrieve any nodes containing ':' e.g kuler:swatchHexColor

Below is a cut down version of the full feed but same structure:

<rss xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:kuler="http://kuler.adobe.com/kuler/API/rss/" xmlns:rss="http://blogs.law.harvard.edu/tech/rss" version="2.0">
 <channel>
 <title>kuler popular themes</title>
 <item>
 <title>Theme Title: Fresh Money</title>
 <description> 
 &lt;img src="http://kuler-api.adobe.com/kuler/themeImages/theme_808366.png" /&gt;&lt;br /&gt;

 Artist: thesylph005&lt;br /&gt;
 ThemeID: 808366&lt;br /&gt;
 Posted: 03/02/2010&lt;br /&gt;

 Hex:
 2F400D, 8CBF26, A8CA65, E8E5B0, 419184
</description>
<kuler:themeItem>
<kuler:themeID>808366</kuler:themeID>
<kuler:themeTitle>Fresh Money</kuler:themeTitle>
<kuler:themeImage>http://kuler-api.adobe.com/kuler/themeImages/theme_808366.png</kuler:themeImage>
<kuler:themeAuthor>
 <kuler:authorID>370750</kuler:authorID>
 <kuler:authorLabel>thesylph005</kuler:authorLabel>
</kuler:themeAuthor>
<kuler:themeTags/>
<kuler:themeRating>4</kuler:themeRating>
<kuler:themeDownloadCount>708</kuler:themeDownloadCount>
<kuler:themeCreatedAt>20100302</kuler:themeCreatedAt>
<kuler:themeEditedAt>20100302</kuler:themeEditedAt>
<开发者_StackOverflow中文版kuler:themeSwatches>
 <kuler:swatch>
  <kuler:swatchHexColor>2F400D</kuler:swatchHexColor>
  <kuler:swatchColorMode>rgb</kuler:swatchColorMode>
  <kuler:swatchChannel1>0.183333</kuler:swatchChannel1>
  <kuler:swatchChannel2>0.25</kuler:swatchChannel2>
  <kuler:swatchChannel3>0.05</kuler:swatchChannel3>
  <kuler:swatchChannel4>0.0</kuler:swatchChannel4>
  <kuler:swatchIndex>0</kuler:swatchIndex>
 </kuler:swatch>
 <kuler:swatch>
  <kuler:swatchHexColor>8CBF26</kuler:swatchHexColor>
  <kuler:swatchColorMode>rgb</kuler:swatchColorMode>
  <kuler:swatchChannel1>0.55</kuler:swatchChannel1>
  <kuler:swatchChannel2>0.75</kuler:swatchChannel2>
  <kuler:swatchChannel3>0.15</kuler:swatchChannel3>
  <kuler:swatchChannel4>0.0</kuler:swatchChannel4>
  <kuler:swatchIndex>1</kuler:swatchIndex>
 </kuler:swatch>
 <kuler:swatch>
  <kuler:swatchHexColor>A8CA65</kuler:swatchHexColor>
  <kuler:swatchColorMode>rgb</kuler:swatchColorMode>
  <kuler:swatchChannel1>0.659722</kuler:swatchChannel1>
  <kuler:swatchChannel2>0.791667</kuler:swatchChannel2>
  <kuler:swatchChannel3>0.395833</kuler:swatchChannel3>
  <kuler:swatchChannel4>0.0</kuler:swatchChannel4>
  <kuler:swatchIndex>2</kuler:swatchIndex>
 </kuler:swatch>
 <kuler:swatch>
  <kuler:swatchHexColor>E8E5B0</kuler:swatchHexColor>
  <kuler:swatchColorMode>rgb</kuler:swatchColorMode>
  <kuler:swatchChannel1>0.91</kuler:swatchChannel1>
  <kuler:swatchChannel2>0.898047</kuler:swatchChannel2>
  <kuler:swatchChannel3>0.688705</kuler:swatchChannel3>
  <kuler:swatchChannel4>0.0</kuler:swatchChannel4>
  <kuler:swatchIndex>3</kuler:swatchIndex>
 </kuler:swatch>
 <kuler:swatch>
  <kuler:swatchHexColor>419184</kuler:swatchHexColor>
  <kuler:swatchColorMode>rgb</kuler:swatchColorMode>
  <kuler:swatchChannel1>0.254901</kuler:swatchChannel1>
  <kuler:swatchChannel2>0.57</kuler:swatchChannel2>
  <kuler:swatchChannel3>0.519034</kuler:swatchChannel3>
  <kuler:swatchChannel4>0.0</kuler:swatchChannel4>
  <kuler:swatchIndex>4</kuler:swatchIndex>
 </kuler:swatch>
</kuler:themeSwatches>

Tue, 30 Mar 2010 11:27:12 PST

So if I do a findall on say each item's description, I get that back fine. But the minute I try to retrieve anything with a : in the nodename I get Exception Type: KeyError Exception Value: ':'

So this works

from elementtree.ElementTree import Element, SubElement, dump, parse
def xml():
    kulerurl = 'http://kuler-api.adobe.com/rss/get.cfm?listType=popular&startIndex=0&itemsPerPage=5&timeSpan=30&key=mykey'
    rss = parse(urllib.urlopen(kulerurl)).getroot()
    for element in rss.findall('channel/item'):
        print(element.findtext('description'))
    dump (rss)

but this doesn't

def xml():
    kulerurl = 'http://kuler-api.adobe.com/rss/get.cfm?listType=popular&startIndex=0&itemsPerPage=5&timeSpan=30&key=mykey'
    rss = parse(urllib.urlopen(kulerurl)).getroot()
    for element in rss.findall('channel/item/kuler:themeItem'):
        print(element.findtext('kuler:themeID'))
    dump (rss)

I'm sure it's something simple if anyone could point me to what I'm doing wrong here I'd be most grateful

thanks Kieran


Based on this article (and comments on this article) I think you have to substitute the namespace name with the actual URI (and remove the colon and put it in {}):

namespace = 'http://kuler.adobe.com/kuler/API/rss/'

def xml():
    kulerurl = 'http://kuler-api.adobe.com/rss/get.cfm?listType=popular&startIndex=0&itemsPerPage=5&timeSpan=30&key=mykey'
    rss = parse(urllib.urlopen(kulerurl)).getroot()
    for element in rss.findall('channel/item/{%s}themeItem' % namespace):
        print(element.findtext('{%s}themeID' % namespace))
    dump (rss)

[XML namespaces]
The element type represents a qualified name pair, also called universal name, as a string of the form “{uri}local“. This syntax can be used both for tag names and for attribute keys.

You can also read in this introduction how ElementTree handles namespaces.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜