Regular expression for matching words between <blockquote> & </blockquote>
开发者_如何转开发Basically I want to strip the document of words between blockquotes. I'm a regular expression newb and even after using rubular, I'm no closer to the answer.
Any help is appreciated.
Use an HTML parser and forget regular expressions. Regex is incapable of correctly handling HTML.
doc = Nokogiri::HTML(your_html)
doc.xpath("//blockquote").remove
From: Strip text from HTML document using Ruby
There are more examples of how to use Nokogiri and XPath, if you look around.
raw example:
/<blockquote>([^<]*)<\/blockquote>/
Sample string:
<blockquote>Hello world</blockquote>
type the following regex in rubular <blockquote>(.+?)</blockquote>
or for something more generic:
<.*?>(.+?)</.*?>
hope it helps!
精彩评论