开发者

Regular Expression to remove " ' from a string in Python

I am fetching my result from a RSS feed using following code:

try:  
desc = item.xpath('description')[0].text
if date is not None:
    desc =date +"\n"+"\n"+desc
except:
    desc = None

But sometimes the description contains few unicode html charecters in feed as below:

The text from XML looks like " and with ' and other &...; stuff

While displaying the content I do not want them to be displayed. Is there any regular ex开发者_JAVA技巧pression to remove the HTML tags.


I used something called "Unescaping XML", don't know if it's helpfull to you.

see : http://wiki.python.org/moin/EscapingXml

from xml.sax.saxutils import unescape

unescape("< & >")

'< & >'




unescape("&apos; &quot;", {"&apos;": "'", "&quot;": '"'})

'\' "'

edit

Just saw this, may be interresting. (Not tested) : unescape with urllib

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜