Python BeautifulSoup encoding
I have a code to read the html and modify some text using Beatiful Soup. It works fine but when I read the output, this part of my html file is changed automatically:
Original : <meta http-equiv="Content-Type" content="text/html; charset=**iso-8859-1**" />
Modified by itself: <meta http-equiv="Content-Type" content="text/html; charset=**utf-8**" />
I don't want any of the file contents to change automatically. Can someone help me with this.
Here is my code:
import re
import sys
from Beaut开发者_如何学运维ifulSoup import BeautifulSoup
f = open(sys.argv[1],"rw")
data = f.read()
soup = BeautifulSoup(data)
comma = re.compile(',')
for t in soup.findAll(text=comma): t.replaceWith(t.replace(',', '&sbquo'))
print soup
Try
print soup.__str__("ISO-8859-1")
精彩评论