开发者

Getting contents of title tags with Python script

I want to make a really script in Python that gets the contents from the title tags of a specified web page and then puts them into a MySQL database.

I have very (and I开发者_如何学运维 mean very) little experience with Python but this needs to be done for my project. How can I do this in the simplest way possible?

I hope you are able to understand what I'm trying to ask.


  1. Study urllib2 to see how to download the webpage.
  2. Study BeautifulSoup to parse the HTML and pull out the title.
  3. Study the Python Database API Specification to insert rows into the MySQL database.

Here is some example code to get you started:

import urllib2
import BeautifulSoup
import MySQLdb

f = urllib2.urlopen('http://www.python.org/')
soup=BeautifulSoup.BeautifulSoup(f.read())
title=soup.find('title')
print(title.string)

connection=MySQLdb.connect(
    host='HOST',user='USER',
    passwd='PASS',db='MYDB')
cursor=connection.cursor()

sql='''CREATE TABLE IF NOT EXISTS foo (
           fooid int(11) NOT NULL AUTO_INCREMENT,
           title varchar(100) NOT NULL,
           PRIMARY KEY (fooid)
       )'''
cursor.execute(sql)

sql='INSERT INTO foo (title) VALUES (%s)'
args=[title.string]
cursor.execute(sql,args)
cursor.close()
connection.close()


use urllib2 to open up the web page. Then parse the returned text with regex to retrieve the title.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜