Regular Expression to remove " ' from a string in Python
I am fetching my result from a RSS feed using following code:
try:
desc = item.xpath('description')[0].text
if date is not None:
desc =date +"\n"+"\n"+desc
except:
desc = None
But sometimes the description contains few unicode html charecters in feed as below:
The text from XML looks like " and with ' and other &...; stuff
While displaying the content I do not want them to be displayed. Is there any regular ex开发者_JAVA技巧pression to remove the HTML tags.
I used something called "Unescaping XML", don't know if it's helpfull to you.
see : http://wiki.python.org/moin/EscapingXml
from xml.sax.saxutils import unescape
unescape("< & >")
'< & >'
unescape("' "", {"'": "'", """: '"'})
'\' "'
edit
Just saw this, may be interresting. (Not tested) : unescape with urllib
精彩评论