How can I use BeautifulSoup to find all the links in a page pointing to a specific domain?
开发者_如何转开发How can I use BeautifulSoup to find all the links in a page pointing to a specific domain?
Use SoupStrainer,
from BeautifulSoup import BeautifulSoup, SoupStrainer
import re
# Find all links
links = SoupStrainer('a')
[tag for tag in BeautifulSoup(doc, parseOnlyThese=links)]
linkstodomain = SoupStrainer('a', href=re.compile('example.com/'))
Edit: Modified example from official doc.
精彩评论