开发者

Replace sequential repeating tags with one of that tag in Ruby

I'm trying to replace multiple sequential <br> tags with just one <br> tag using Ruby.

For instance:

Hello
<br><br/><br>
World!

would become

Hello
<br>开发者_StackOverflow社区
World!


You could do this with a regular expression, like:

 "Hello\n<br><br/><br>\nworld".gsub(/(?im)(<br\s*\/?>\s*)+/,'<br>')

To explain that: the (?im) part has options indicating that the match should be case-insensitive and that . should match newlines. The grouped expression (<br\s*\/?>\s*) matches <br> (optionally with whitespace and a trailing /) possibly followed by whitespace, and the + says to match one or more of that group.

However, I should point out that in general it's not a good idea to use regular expressions for manipulating HTML - you should use a proper parser instead. For example, here's a better way of doing it using Nokogiri:

require 'nokogiri'

document = Nokogiri::HTML.parse("Hello
<br><br/><br>
World!")

document.search('//br').each do |node|
    node.remove if node.next.name == 'br'
end

puts document

That will produce output like:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>Hello
<br>
World!</p></body></html>

(The parser turns your input into a well-formed document, which is why you have the DOCTYPE and enclosing <html><body><p> tags.)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜