Replace sequential repeating tags with one of that tag in Ruby
I'm trying to replace multiple sequential <br>
tags with just one <br>
tag using Ruby.
For instance:
Hello
<br><br/><br>
World!
would become
Hello
<br>开发者_StackOverflow社区
World!
You could do this with a regular expression, like:
"Hello\n<br><br/><br>\nworld".gsub(/(?im)(<br\s*\/?>\s*)+/,'<br>')
To explain that: the (?im)
part has options indicating that the match should be case-insensitive and that .
should match newlines. The grouped expression (<br\s*\/?>\s*)
matches <br>
(optionally with whitespace and a trailing /
) possibly followed by whitespace, and the +
says to match one or more of that group.
However, I should point out that in general it's not a good idea to use regular expressions for manipulating HTML - you should use a proper parser instead. For example, here's a better way of doing it using Nokogiri:
require 'nokogiri'
document = Nokogiri::HTML.parse("Hello
<br><br/><br>
World!")
document.search('//br').each do |node|
node.remove if node.next.name == 'br'
end
puts document
That will produce output like:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>Hello
<br>
World!</p></body></html>
(The parser turns your input into a well-formed document, which is why you have the DOCTYPE and enclosing <html><body><p>
tags.)
精彩评论