开发者

Beautiful Soup - how to parse table's columns and insert them into two lists

I am trying to parse a table with two columns and insert the text from each column into two lists.

I need some ideas how to do it.

from BeautifulSoup import BeautifulSoup

s = """<table><tr><td valign="top" width="25%"><b>Text1</b><a href="#">Link1</a>:</b></td><td>AAAA<a href="#">BBBB</a></td></tr>
<tr><td valign="top" width="25%"><b>Text2:</b></td><td>CCCC<a href="#">DDDD</a></td></tr>
<tr><td valign="top" width="25%"><b><a href="#">Link2</a>:</b></td><td><a href="#">EEEE</a> FFFF</td></tr></table>
<tr><td valign="top" width="25%"><b>Text3 <br> Text4:</b></td><td><a href="#">EEEE</a> FFFF</td></tr></table>"""

a = BeautifulSoup(s)

b = a.findAll('td', text=True)

left = []
right = []

for i in b:
    print i

What I get:

Text1

Link1

:

AAAA

BBBB

What I need:

left = ["Text1", "Link1"]

right = [AAA开发者_运维问答A", "BBBB"]


Get the row first, and then get the cell:

left = []
right = []

for tr in a.findAll('tr'):
    l, r = tr.findAll('td')
    left.extend(l.findAll(text=True))
    right.extend(r.findAll(text=True))

I haven't tested this, but pretty sure it should work :)

EDIT: fixed (hopefully)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜