Finding inline style with lxml.cssselector
New to this library (no more familiar with BeautifulSoup开发者_JS百科 either, sadly), trying to do something very simple (search by inline style):
<td style="padding: 20px">blah blah </td>
I just want to select all tds where style="padding: 20px", but I can't seem to figure it out. All the examples show how to select td, such as:
for col in page.cssselect('td'):
but that doesn't help me much.
Well, there's a better way: XPath.
import lxml.html
data = """<td style="padding: 20px">blah blah </td>
<td style="padding: 21px">bow bow</td>
<td style="padding: 20px">buh buh</td>
"""
doc = lxml.html.document_fromstring(data)
for col in doc.xpath("//td[@style='padding: 20px']"):
print col.text
That is neater and also faster.
If you prefer to use CSS selectors:
import lxml.html
data = """<td style="padding: 20px">blah blah </td>
<td style="padding: 21px">bow bow</td>
<td style="padding: 20px">buh buh</td>
"""
doc = lxml.html.document_fromstring(data)
for td in doc.cssselect('td[style="padding: 20px"]'):
print td.text
Note that both Ruslan Spivak and nosklo have given better answers below.
import lxml.html
data = """<td style="padding: 20px">blah blah </td>
<td style="padding: 21px">bow bow</td>
<td style="padding: 20px">buh buh</td>
"""
doc = lxml.html.document_fromstring(data)
for col in doc.cssselect('td'):
style = col.attrib['style']
if style=='padding: 20px':
print(col.text.strip())
prints
blah blah
buh buh
and manages to skip bow bow
.
精彩评论