开发者

Is there a clean way to get the n-th column of an html table using BeautifulSoup?

Say we look at the first table in a page, so:

table = BeautifulSoup(...).table

the rows can be scanned with a clean for-loop:

for row in table:
    f(row)

But for getting a single column things get messy.

My q开发者_JAVA百科uestion: is there an elegant way to extract a single column, either by its position, or by its 'name' (i.e. text that appears in the first row of this column)?


lxml is many times faster than BeautifulSoup, so you might want to use that.

from lxml.html import parse
doc = parse('http://python.org').getroot()
for row in doc.cssselect('table > tr'):
    for cell in row.cssselect('td:nth-child(3)'):
         print cell.text_content()

Or, instead of looping:

rows = [ row for row in doc.cssselect('table > tr') ]
cells = [ cell.text_content() for cell in rows.cssselect('td:nth-child(3)') ]
print cells
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜