开发者

Handle <nobr> tag in python sgmllib

I'm trying to parse a page using my python script. But 开发者_Python百科<nobr> tag along with '&' is giving me trouble. Here the actual html.

<A HREF="http://enpass.in/algo/c12.html" CLASS="style"> <NOBR>Simulation for 1st & 2nd path</NOBR></A>

Now my handle_data function of my parser(using sgmllib) is not able to handle the data properly. Here is the handle_data code.

def handle_data(self, data):
        self.datainfo.append(data)

I expect datainfo array to be have only one element namely "Simulation for 1st & 2nd path"

However, when I print the datainfo array, the actual contents of datainfo array are 7 in number.

datainfo -> ['', '', 'Simulation for 1st', '&', '2nd path', '', '']

Whats happening?


You need to encode the ampersand, like &amp; to become valid HTML.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜