Is it possible for BeautifulSoup to work in a case-insensitive manner?

2022-12-26 08:40 问答作者：

I am trying to extract Meta Description for fetched webpages. But here I am facing the problem of case sensitivity of BeautifulSoup.

As some of the pages have <meta name="Description and some have <meta name="description.

My problem is very much similar to that of Question on Stackoverflow

The only difference is that I can't use lxml .. I have to stick with Beautifulsou开发者_运维问答p.

You can give BeautifulSoup a regular expression to match attributes against. Something like

soup.findAll('meta', name=re.compile("^description$", re.I))

might do the trick. Cribbed from the BeautifulSoup docs.

A regular expression? Now we have another problem.

Instead, you can pass in a lambda:

soup.findAll(lambda tag: tag.name.lower()=='meta',
    name=lambda x: x and x.lower()=='description')

(x and avoids an exception when the name attribute isn't defined for the tag)

With minor changes it works.

soup.findAll('meta', attrs={'name':re.compile("^description$", re.I)})

With bs4 use the following:

soup.find('meta', attrs={'name': lambda x: x and x.lower()=='description'})

Better still use a css attribute = value selector with i argument for case insensitivity

soup.select('meta[name="description" i]')

change case of the html page source. Use functions such as string.lower(), string.upper()

继续阅读：python

精彩评论