开发者

Python BeautifulSoup encoding

I have a code to read the html and modify some text using Beatiful Soup. It works fine but when I read the output, this part of my html file is changed automatically:

Original : <meta http-equiv="Content-Type" content="text/html; charset=**iso-8859-1**" />

Modified by itself: <meta http-equiv="Content-Type" content="text/html; charset=**utf-8**" />

I don't want any of the file contents to change automatically. Can someone help me with this.

Here is my code:

import re

import sys

from Beaut开发者_如何学运维ifulSoup import BeautifulSoup

f = open(sys.argv[1],"rw")
data = f.read()

soup = BeautifulSoup(data)

comma = re.compile(',')

for t in soup.findAll(text=comma): t.replaceWith(t.replace(',', '&sbquo'))

print soup


Try

print soup.__str__("ISO-8859-1")
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜