python HTMLParser to replace some strings in the data of the html file
I need to replace some strings in the data content of my html page. I can't use replace function directly because I need to change only the data section. It shouldn't modify any of the tags or attributes. I used HTMLParser
for this. But I am stuck on writing it back to file. Using HTMLParser
I can parse and get data content on which I will do necessary changes. But how to put it back to my html file ?
Please help. Here is my code:
class EntityHTML(HTMLParser.HTMLParser):
def __init__(self, filename):
HTMLParser.HTMLParser.__init__(self)
f = open(filename)
self.feed(f.read())
def handle_starttag(self, tag, attrs):
"""Needn't do anything here"""
pass
def handle_d开发者_如何转开发ata(self, data):
print data
data = data.replace(",", "&sbquo")
HTMLParser
doesn't construct any representation in memory of your html file. You could do it yourself in handle_*()
methods but a simpler way would be to use BeautifulSoup:
>>> import re
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup('<a title=,>,</a>')
>>> print soup
<a title=",">,</a>
>>> comma = re.compile(',')
>>> for t in soup.findAll(text=comma): t.replaceWith(t.replace(',', '&sbquo'))
>>> print soup
<a title=",">&sbquo</a>
精彩评论