parse html tags, based on a class and href tag using beautiful soup
I am trying to parse HTML with BeautifulSoup.
The content I want is like this:
<a class="yil-biz-ttl" id="yil_biz_ttl-2" href="http://some-web-url开发者_如何转开发/" title="some title">Title</a>
i tried and got the following error:
maxx = soup.findAll("href", {"class: "yil-biz-ttl"})
------------------------------------------------------------
File "<ipython console>", line 1
maxx = soup.findAll("href", {"class: "yil-biz-ttl"})
^
SyntaxError: invalid syntax
what i want is the string : http://some-web-url/
soup.findAll('a', {'class': 'yil-biz-ttl'})[0]['href']
To find all such links:
for link in soup.findAll('a', {'class': 'yil-biz-ttl'}):
try:
print link['href']
except KeyError:
pass
You're missing a close-quote after "class
:
maxx = soup.findAll("href", {"class: "yil-biz-ttl"})
should be
maxx = soup.findAll("href", {"class": "yil-biz-ttl"})
also, I don't think you can search for an attribute like href
like that, I think you need to search for a tag:
maxx = [link['href'] for link in soup.findAll("a", {"class": "yil-biz-ttl"})]
To find all <a/>
elements from CSS class "yil-biz-ttl"
that have href
attribute with anything in it:
from bs4 import BeautifulSoup # $ pip install beautifulsoup4
soup = BeautifulSoup(HTML)
for link in soup("a", "yil-biz-ttl", href=True):
print(link['href'])
At the moment all other answers don't satisfy the above requirements.
Well first of all you have a syntax error. You have your quotes wrong in class
part.
Try:
maxx = soup.findAll("href", {"class": "yil-biz-ttl"})
精彩评论