Using beautifulsoup, how to I reference table rows in html page
I have a html page that looks like:
<html>
..
<form post="/products.hmlt" ..>
..
<table ...>
<tr>...</tr>
<tr>
<td>part info</td>
..
</tr>
</table>
..
开发者_StackOverflow中文版
</form>
..
</html>
I tried:
form = soup.findAll('form')
table = form.findAll('table') # table inside form
But I get an error saying:
ResultSet object has no attribute 'findAll'
I guess the call to findAll doesn't return a 'beautifulsoup' object? what can I do then?
Update
There are many tables on this page, but only 1 table INSIDE the tag shown above.
findAll
returns a list, so extract the element first:
form = soup.findAll('form')[0]
table = form.findAll('table')[0] # table inside form
Of course, you should do some error checking (i.e. make sure it's not empty) before indexing into the list.
I like ars's answer, and certainly agree w/ the need for error-checking;
especially if this is going to be used in any kind of production code.
Here's perhaps a more verbose / explicit way of finding the data you seek:
from BeautifulSoup import BeautifulSoup as bs
html = '''<html><body><table><tr><td>some text</td></tr></table>
<form><table><tr><td>some text we care about</td></tr>
<tr><td>more text we care about</td></tr>
</table></form></html></body>'''
soup = bs(html)
for tr in soup.form.findAll('tr'):
print tr.text
# output:
# some text we care about
# more text we care about
For reference here is the cleaned-up HTML:
>>> print soup.prettify()
<html>
<body>
<table>
<tr>
<td>
some text
</td>
</tr>
</table>
<form>
<table>
<tr>
<td>
some text we care about
</td>
</tr>
<tr>
<td>
more text we care about
</td>
</tr>
</table>
</form>
</body>
</html>
精彩评论