Regular Expression to remove " ' from a string in Python

2023-04-03 18:02 问答作者：

I am fetching my result from a RSS feed using following code:

try:  
desc = item.xpath('description')[0].text
if date is not None:
    desc =date +"\n"+"\n"+desc
except:
    desc = None

But sometimes the description contains few unicode html charecters in feed as below:

The text from XML looks like " and with ' and other &...; stuff

While displaying the content I do not want them to be displayed. Is there any regular ex开发者_JAVA技巧pression to remove the HTML tags.

I used something called "Unescaping XML", don't know if it's helpfull to you.

see : http://wiki.python.org/moin/EscapingXml

from xml.sax.saxutils import unescape

unescape("&lt; &amp; &gt;")

'< & >'




unescape("&apos; &quot;", {"&apos;": "'", "&quot;": '"'})

'\' "'

edit

Just saw this, may be interresting. (Not tested) : unescape with urllib

精彩评论