Getting contents of title tags with Python script
I want to make a really script in Python that gets the contents from the title tags of a specified web page and then puts them into a MySQL database.
I have very (and I开发者_如何学运维 mean very) little experience with Python but this needs to be done for my project. How can I do this in the simplest way possible?
I hope you are able to understand what I'm trying to ask.
- Study urllib2 to see how to download the webpage.
- Study BeautifulSoup to parse the HTML and pull out the title.
- Study the Python Database API Specification to insert rows into the MySQL database.
Here is some example code to get you started:
import urllib2
import BeautifulSoup
import MySQLdb
f = urllib2.urlopen('http://www.python.org/')
soup=BeautifulSoup.BeautifulSoup(f.read())
title=soup.find('title')
print(title.string)
connection=MySQLdb.connect(
host='HOST',user='USER',
passwd='PASS',db='MYDB')
cursor=connection.cursor()
sql='''CREATE TABLE IF NOT EXISTS foo (
fooid int(11) NOT NULL AUTO_INCREMENT,
title varchar(100) NOT NULL,
PRIMARY KEY (fooid)
)'''
cursor.execute(sql)
sql='INSERT INTO foo (title) VALUES (%s)'
args=[title.string]
cursor.execute(sql,args)
cursor.close()
connection.close()
use urllib2 to open up the web page. Then parse the returned text with regex to retrieve the title.
精彩评论