开发者

Delete rest of HTML file after some text

I am scraping HTML file using BeautifulSoup in python. I want to delete text after find a word.

Ex:

<div class="content">

<p> Page 1 </p>
<p> Page 2 </p>
<p> Page 3 </p>
<p> Page 4 </p>
<p> Page 5 </p>

</div>

I want to delete from Page 3.

<div class="content">

<p> Page 1 </p>
<p> Page 2 </p开发者_StackOverflow中文版>
<p> Page 3 </p>

</div>

I have tried the following

p = soup.findAll('p')
if len(p) > 3 :
   d = p[3]
   while d:
       e = d.next
       d.extract()
       d = e

replacing d.extract() with del(d) is also not working. Please help.


Try this:

p = soup.findAll('p')  
while len(p) > 3:
    last_p = p.pop()
    last_p.extract()
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜