开发者

Python fetching <title>

I want to fetch the title of a webpage which I open using urllib2. What is the best way to do this, to parse the html and find what I need (开发者_如何转开发for now only the -tag but might need more in the future).

Is there a good parsing lib for this purpose?


Yes I would recommend BeautifulSoup

If you're getting the title it's simply:

soup = BeautifulSoup(html)
myTitle = soup.html.head.title

or

myTitle = soup('title')

Taken from the documentation

It's very robust and will parse the html no matter how messy it is.


Try Beautiful Soup:

url = 'http://www.example.com'
response = urllib2.urlopen(url)
html = response.read()

soup = BeautifulSoup(html)
title = soup.html.head.title
print title.contents


Use Beautiful Soup.

html = urllib2.urlopen("...").read()
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)
print soup.title.string


Why are you guys importing a whole extra library for one task. No regular expressions? wasn't the request for urllib not bs4 or mech which are third party? to do with standard libraries parse the html and match the string then split the '>' '<' with re or whateves.

N=(len(html))
for a in html(N):
    if '<title>' in a:
        Title=(str(a))

thats python 2 I think, you can strip it

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜