Delete rest of HTML file after some text
I am scraping HTML file using BeautifulSoup in python. I want to delete text after find a word.
Ex:
<div class="content">
<p> Page 1 </p>
<p> Page 2 </p>
<p> Page 3 </p>
<p> Page 4 </p>
<p> Page 5 </p>
</div>
I want to delete from Page 3.
<div class="content">
<p> Page 1 </p>
<p> Page 2 </p开发者_StackOverflow中文版>
<p> Page 3 </p>
</div>
I have tried the following
p = soup.findAll('p')
if len(p) > 3 :
d = p[3]
while d:
e = d.next
d.extract()
d = e
replacing d.extract()
with del(d)
is also not working.
Please help.
Try this:
p = soup.findAll('p') while len(p) > 3: last_p = p.pop() last_p.extract()
精彩评论