Removing XML entities from string in Ruby
I try to parse RSS chaanal with simple-rss lib.
Unfortunately I got a lot of garbage in node:
<description><p>
some decryption
</p>
<a href="http://url.com/trac/xxx/wiki/foo?action=diff&amp;version=28">(diff)</a></descripti开发者_运维技巧on>
I need to retrieve text ("some description") and optionally url.
What is the best way to do it? Regexp (if this is answer could You give me example, please?)?
Thats not garbage. It is just HTML sanitized string of characters. And I am assuming by the url, you mean with the html tags(<a></a>
). Following code should work.
require 'cgi'
description = "</p> <a href=\"http://url.com/trac/xxx/wiki/foo?action=diff&amp;version=28\">(diff)</a>"
CGI.unescapeHTML(description) # => </p> <a href="http://url.com/trac/xxx/wiki/foo?action=diff&version=28">(diff)</a>
If you don't want the html tags, there are various ways to just obtain the url. A simple regex for the url should work, which I leave it to you to figure out.(Hint - Google)
精彩评论