开发者

Beautiful Soup - handling errors

  1. I'd like to know how to handle a situation when href doesn't exist after the <strong>Text:</strong>

  2. Is there a better way to search for the content that exis开发者_JAVA百科ts after <strong>Contact:</strong>

http://pastebin.com/FYMxTJkf


How about findNext?

import re
from BeautifulSoup import BeautifulSoup

html = '''<strong>Text:</strong>   

        <a href='http://domain.com'>url</a>'''

soup = BeautifulSoup(html)
label = soup.find("strong" , text='Text:')
contact = label.findNext('a')

if contact.get('href') != None:
    print contact
else:
    print "No href"

If you're looking specifically for an a tag with an href, use:

contact = label.findNext('a', attrs={'href' : True})

With this you won't need to condense whitespace. I imagine you did this because next was returning the whitespace after the label.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜