开发者

How to parse shrink the web xml with ElementTree

I am trying to use the shrink the web service for site thumbnails. They have a API that returns XML telling you if the site thumbnail can be created. I am trying to use ElementTree to parse the xml, but not sure how to get to the information I need. He开发者_StackOverflow社区re is a example of the XML response:

<?xml version="1.0" encoding="UTF-8"?> 
<stw:ThumbnailResponse xmlns:stw="http://www.shrinktheweb.com/doc/stwresponse.xsd">
    <stw:Response>
        <stw:ThumbnailResult>
            <stw:Thumbnail Exists="false"></stw:Thumbnail>
            <stw:Thumbnail Verified="false">fix_and_retry</stw:Thumbnail>
        </stw:ThumbnailResult>
        <stw:ResponseStatus>
            <stw:StatusCode>Blank Detected</stw:StatusCode>
        </stw:ResponseStatus>
        <stw:ResponseTimestamp>
            <stw:StatusCode></stw:StatusCode>
        </stw:ResponseTimestamp>
        <stw:ResponseCode>
            <stw:StatusCode></stw:StatusCode>
        </stw:ResponseCode>
        <stw:CategoryCode>
            <stw:StatusCode>none</stw:StatusCode>
        </stw:CategoryCode>
        <stw:Quota_Remaining>
            <stw:StatusCode>1</stw:StatusCode>
        </stw:Quota_Remaining>
    </stw:Response>
</stw:ThumbnailResponse>

I need to get the "stw:StatusCode". If I try to do a find on "stw:StatusCode" I get a "expected path separator" syntax error. Is there a way to just get the status code?


Grrr namespaces ....try this:

STW_PREFIX = "{http://www.shrinktheweb.com/doc/stwresponse.xsd}"

(see line 2 of your sample XML)

Then when you want a tag like stw:StatusCode, use STW_PREFIX + "StatusCode"

Update: That XML response isn't the most brilliant design. It's not possible to guess from your single example whether there can be more than 1 2nd-level node. Note that each 3rd-level node has a "StatusCode" child. Here is some rough-and-ready code that shows you (1) why you need that STW_PREFIX caper (2) an extract of the usable info.

import xml.etree.cElementTree as et
def showtag(elem):
    return repr(elem.tag.rsplit("}")[1])
def showtext(elem):
    return None if elem.text is None else repr(elem.text.strip())
root = et.fromstring(xml_response) # xml_response is your input string
print repr(root.tag) # see exactly what tag is in the element
for child in root[0]:
    print showtag(child), showtext(child)
    for gc in child:
        print "...", showtag(gc), showtext(gc), gc.attrib

Result:

'{http://www.shrinktheweb.com/doc/stwresponse.xsd}ThumbnailResponse'
'ThumbnailResult' ''
... 'Thumbnail' None {'Exists': 'false'}
... 'Thumbnail' 'fix_and_retry' {'Verified': 'false'}
'ResponseStatus' ''
... 'StatusCode' 'Blank Detected' {}
'ResponseTimestamp' ''
... 'StatusCode' None {}
'ResponseCode' ''
... 'StatusCode' None {}
'CategoryCode' ''
... 'StatusCode' 'none' {}
'Quota_Remaining' ''
... 'StatusCode' '1' {}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜