Python and XML Processing
I have used urllib to get the following data:
<?xml version="1.0" encoding=开发者_Go百科"UTF-8" standalone="yes"?>
<videos xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:www="http://www.www.com"">
<video type="cl">
<cd>
<src lang="music">http://www.google.com/ </src>
</cd>
</video>
</videos>
I want to get http://www.google.com/
out, here is my code:
import xml.etree.ElementTree as etree
data='<?xml version="1.0" encoding="UTF-8" standalone="yes"?><videos xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:www="http://www.www.com""><video type="cl"><cd><src lang="music">http://www.google.com/ </src></cd></video></videos>'
tree = etree.fromstring(data)
geturl=tree.findtext('/video/cd/src').strip()
print geturl
I get error:
AttributeError: 'NoneType' object has no attribute 'strip'
Obviously, the findtext
failed. I tried findtext('src')
, also wont work.
Whats wrong?
Remove the first forward-slash from the path: video/cd/src
:
import xml.etree.ElementTree as etree
data='''<?xml version="1.0" encoding="UTF-8" standalone="yes"?><videos xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:www="http://www.www.com"><video type="cl"><cd><src lang="music">http://www.google.com/ </src></cd></video></videos>'''
tree = etree.fromstring(data)
geturl=tree.findtext('video/cd/src').strip()
print geturl
yields
http://www.google.com/
The forward-slash indicates an absolute path, which is not allowed on elements.
PS. There is also a syntax error in the data you posted: xmlns:www="http://www.www.com""
has two double-quotes at the end...
精彩评论