How can I get all html following a searched item using BeautifulSoup in Python? [closed]
I am trying to return all of the html after a search text string using BeautifulSoup in Python. Here is my code:
html = '<html>table1<table><tr>text1<td>text2</td></tr></table>table2<table><tr>text3<td>text4</td></tr></table></html>'
soup = BeautifulSoup(''.join(html))
foundtext = soup.find(text='text1')
soup2 = foundtext.findAll()
This code is giving me error. In soup2, I would like to have:
<td>text2</td></tr></table>table2<table><tr>text3<td>text4</td></tr></table></html>
which is all html code following the phrase 'text1'.
The following code will print out the nodes after the first occurence of text1
from BeautifulSoup import BeautifulSoup, NavigableString
html = '<html>table1<table><tr>text1<td>text2</td></tr></table>table2<table><tr>text3<td>text4</td></tr></table></html>'
soup = BeautifulSoup(html)
found = False
for node in soup.recursiveChildGenerator():
if found:
print node
if isinstance(node, NavigableString) and node == 'text1':
found = True
> suxmac2:tmp ajung$ bin/python out
> <td>text2</td> text2 table2
> <table><tr>text3<td>text4</td></tr></table>
> <tr>text3<td>text4</td></tr> text3
> <td>text4</td> text4
Adjusting the code to your further needs is up to you...we helped you already several times. Once again: read the BeautifulSoup documentation - you got the link meanwhile numerous times.
I believe that is not possible, as BeautifulSoup keeps the parsed HTML as a tree structure. What you could do is to extract all unwanted elements using http://www.crummy.com/software/BeautifulSoup/documentation.html#Removing%20elements , which would return the HTML in front of your search string as well.
Apart from that, you could also use the HTML snippet from the element that you searched for. You can see in BeautifulSoup Documentation that find
returns a HTML string. Use that and simple python string-searching methods to cut away everything until the end of the found string. That will probably require more handwork and basically is like combining the answer How can I get all html following a searched item using BeautifulSoup in Python? with BeautifulSoup's search method.
精彩评论