Errors with conditional etree lxml
I'm trying to delete everything between if between is number 66:
开发者_运维问答I get the following error: TypeError: argument of type 'NoneType' is not iterable...if element.tag == 'answer' and '-66' in element.text:
What is wrong with that? Any help?
#!/usr/local/bin/python2.7
# -*- coding: UTF-8 -*-
from lxml import etree
planhtmlclear_utf=u"""
<questionaire>
<question>
<questiontext>What's up?</questiontext>
<answer></answer>
</question>
<question>
<questiontext>Cool?</questiontext>
<answer>-66</answer>
</question>
</questionaire>
"""
html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
for element in question.getchildren():
if element.tag == 'answer' and '-66' in element.text:
html.xpath('/questionaire')[0].remove(question)
print etree.tostring(html)
element.text seems to be None on some iterations. The error is saying that it cant look through None for "-66", so check that element.text is not None first like this:
html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
for element in question.getchildren():
if element.tag == 'answer' and element.text and '-66' in element.text:
html.xpath('/questionaire')[0].remove(question)
print etree.tostring(html)
The line its failing at in the xml is <answer></answer>
where there is no text in between the tag.
Edit (for the second part of your issue about combining tags):
You can use BeautifulSoup
like this:
from lxml import etree
import BeautifulSoup
planhtmlclear_utf=u"""
<questionaire>
<question>
<questiontext>What's up?</questiontext>
<answer></answer>
</question>
<question>
<questiontext>Cool?</questiontext>
<answer>-66</answer>
</question>
</questionaire>"""
html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
for element in question.getchildren():
if element.tag == 'answer' and element.text and '-66' in element.text:
html.xpath('/questionaire')[0].remove(question)
soup = BeautifulSoup.BeautifulStoneSoup(etree.tostring(html))
print soup.prettify()
Prints:
<questionaire>
<question>
<questiontext>
What's up?
</questiontext>
<answer>
</answer>
</question>
</questionaire>
Here is a link where you can download the BeautifulSoup module.
Or, to do this a more compact way:
from lxml import etree
import BeautifulSoup
# abbreviating to reduce answer length...
planhtmlclear_utf=u"<questionaire>.........</questionaire>"
html = etree.fromstring(planhtmlclear_utf)
[question.getparent().remove(question) for question in html.xpath('/questionaire/question[answer/text()="-66"]')]
print BeautifulSoup.BeautifulStoneSoup(etree.tostring(html)).prettify()
An alternative to checking if element.text
is None
, is to refine your XPath:
questions = html.xpath('/questionaire/question[answer/text()="-66"]')
for question in questions:
question.getparent().remove(question)
The brackets [...]
mean "such that". So
question # find all question elements
[ # such that
answer # it has an answer subelement
/text() # whose text
= # equals
"-66" # "-66"
]
精彩评论