开发者

python HTMLParser to replace some strings in the data of the html file

I need to replace some strings in the data content of my html page. I can't use replace function directly because I need to change only the data section. It shouldn't modify any of the tags or attributes. I used HTMLParser for this. But I am stuck on writing it back to file. Using HTMLParser I can parse and get data content on which I will do necessary changes. But how to put it back to my html file ?

Please help. Here is my code:

class EntityHTML(HTMLParser.HTMLParser):
    def __init__(self, filename):
        HTMLParser.HTMLParser.__init__(self)
        f = open(filename)
        self.feed(f.read())

    def handle_starttag(self, tag, attrs):
        """Needn't do anything here"""
        pass

    def handle_d开发者_如何转开发ata(self, data):
        print data
        data = data.replace(",", "&sbquo")


HTMLParser doesn't construct any representation in memory of your html file. You could do it yourself in handle_*() methods but a simpler way would be to use BeautifulSoup:

>>> import re
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup('<a title=,>,</a>')
>>> print soup
<a title=",">,</a>
>>> comma = re.compile(',')
>>> for t in soup.findAll(text=comma): t.replaceWith(t.replace(',', '&sbquo'))
>>> print soup
<a title=",">&sbquo</a>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜