开发者

parse html tags, based on a class and href tag using beautiful soup

I am trying to parse HTML with BeautifulSoup.

The content I want is like this:

<a class="yil-biz-ttl" id="yil_biz_ttl-2" href="http://some-web-url开发者_如何转开发/" title="some title">Title</a> 

i tried and got the following error:

maxx = soup.findAll("href", {"class: "yil-biz-ttl"})
------------------------------------------------------------
   File "<ipython console>", line 1
     maxx = soup.findAll("href", {"class: "yil-biz-ttl"})
                                             ^
SyntaxError: invalid syntax

what i want is the string : http://some-web-url/


soup.findAll('a', {'class': 'yil-biz-ttl'})[0]['href']

To find all such links:

for link in soup.findAll('a', {'class': 'yil-biz-ttl'}):
    try:
        print link['href']
    except KeyError:
        pass


You're missing a close-quote after "class:

 maxx = soup.findAll("href", {"class: "yil-biz-ttl"})

should be

 maxx = soup.findAll("href", {"class": "yil-biz-ttl"})

also, I don't think you can search for an attribute like href like that, I think you need to search for a tag:

 maxx = [link['href'] for link in soup.findAll("a", {"class": "yil-biz-ttl"})]


To find all <a/> elements from CSS class "yil-biz-ttl" that have href attribute with anything in it:

from bs4 import BeautifulSoup  # $ pip install beautifulsoup4

soup = BeautifulSoup(HTML)
for link in soup("a", "yil-biz-ttl", href=True):
    print(link['href'])

At the moment all other answers don't satisfy the above requirements.


Well first of all you have a syntax error. You have your quotes wrong in class part.

Try:

maxx = soup.findAll("href", {"class": "yil-biz-ttl"})

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜