Python fetching <title>
I want to fetch the title of a webpage which I open using urllib2. What is the best way to do this, to parse the html and find what I need (开发者_如何转开发for now only the -tag but might need more in the future).
Is there a good parsing lib for this purpose?
Yes I would recommend BeautifulSoup
If you're getting the title it's simply:
soup = BeautifulSoup(html)
myTitle = soup.html.head.title
or
myTitle = soup('title')
Taken from the documentation
It's very robust and will parse the html no matter how messy it is.
Try Beautiful Soup:
url = 'http://www.example.com'
response = urllib2.urlopen(url)
html = response.read()
soup = BeautifulSoup(html)
title = soup.html.head.title
print title.contents
Use Beautiful Soup.
html = urllib2.urlopen("...").read()
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)
print soup.title.string
Why are you guys importing a whole extra library for one task. No regular expressions? wasn't the request for urllib not bs4 or mech which are third party? to do with standard libraries parse the html and match the string then split the '>'
'<'
with re or whateves.
N=(len(html))
for a in html(N):
if '<title>' in a:
Title=(str(a))
thats python 2 I think, you can strip it
精彩评论