XPath match every node containing text
How do I match all child nodes containing text recursively.
If I have a tree like
table
tr
td
"hello"
td
b
"hi"
tr
td
"salud"
td
em
"bonjour"
开发者_运维技巧
How do I match every single string within the table node with xpath? Something like "//table/*/text()"?
The XPath expression you gave was almost correct already:
//table//text()
will get you all text nodes within all tables in the document.
How about the following?
from lxml import etree
from StringIO import StringIO
input = '''
<table>
<tr>
<td>hello</td>
<td><b>hi</b></td>
</tr>
<tr>
<td>salud</td>
<td><em>bonjour</em></td>
</tr>
</table>
'''
parser = etree.HTMLParser()
tree = etree.parse(StringIO(input), parser)
for p in tree.xpath("//table/tr/td//text()"):
print p
... which gives the output:
hello
hi
salud
bonjour
精彩评论