开发者

Getting BeautifulSoup to catch tags in a non-case-sensitive way

I want to catch som开发者_如何学JAVAe tags with BeautifulSoup: Some <p> tags, the <title> tag, some <meta> tags. But I want to catch them regardless of their case; I know that some sites do meta like this: <META> and I want to be able to catch that.

I noticed that BeautifulSoup is case-sensitive by default. How do I catch these tags in a non-case-sensitive way?


BeautifulSoup standardises the parse tree on input. It converts tags to lower-case. You don't have anything to worry about IMO.


You can use soup.findAll which should match case-insensitively:

import BeautifulSoup

html = '''<html>
<head>
<meta name="description" content="Free Web tutorials on HTML, CSS, XML" /> 
<META name="keywords" content="HTML, CSS, XML" /> 
<title>Test</title>
</head>
<body>
</body>
</html>'''

soup = BeautifulSoup.BeautifulSoup(html)
for x in soup.findAll('meta'):
    print x

Result:

<meta name="description" content="Free Web tutorials on HTML, CSS, XML" />
<meta name="keywords" content="HTML, CSS, XML" />
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜