开发者

How do I wrap untagged text elements using Nokogiri?

For example I have a html string:

<span class="no">1172</span><span class=开发者_运维问答"r">case</span> primary_key_prefix_type

How to wrap every element which doesn't have tag by Nokogiri like this:

<span class="no">1172</span><span class="r">case</span> <span>primary_key_prefix_type</span>


This doesn't feel like the most elegant solution, but it works:

require 'nokogiri'

# Given a node, find each whitespace-delimited word
# and wrap it in the supplied markup
def wrap_text( node, wrapper='<span />' )
  wrapper = Nokogiri::XML::DocumentFragment.parse(wrapper).children.first
  node.xpath('child::text()').each do |text_node|
    text_node.swap( text_node.text.gsub(/(\s*)(\S+)(\s*)/) do
      "#{$1}#{
        wrapper.clone.tap{ |w| w.inner_html = $2 }.to_html
      }#{$3}"
    end )
  end
  node
end    

# Testing
html = Nokogiri::HTML '<body>
  <p><span class="no">1172</span><span class="r">case</span> primary_key_prefix_type</p>
  <p>Hello <b>cool</b> world #42!</p>
</body>'

html.search('p').each{ |para| wrap_text(para) }
puts html.at('body')
#=> <body>
#=>   <p><span class="no">1172</span><span class="r">case</span> <span>primary_key_prefix_type</span></p>
#=>   <p><span>Hello</span> <b>cool</b> <span>world</span> <span>#42!</span></p>
#=> </body>

Edit: More examples:

# If your lines don't have element wrapping them...
raw = [
  '<span class="no">1172</span><span class="r">case</span> primary_key',
  'Hello <b>cool</b> world #42!'
]
puts raw.map{ |line| wrap_text(Nokogiri::HTML(line).at('body')).inner_html }
#=> <span class="no">1172</span><span class="r">case</span> <span>primary_key</span>
#=> <p>Hello <b>cool</b> world #42!</p>

# If your lines each have exactly one element wrapping them...
wrapped = [
  '<a><span class="no">1172</span><span class="r">case</span> primary_key</a>',
  '<b>Hello <b>cool</b> world #42!</b>'
]
body = Nokogiri::HTML(wrapped.join("\n")).at('body')
puts body.children.map{ |e| wrap_text(e) }
#=> <a><span class="no">1172</span><span class="r">case</span> <span>primary_key</span></a>
#=> <b><span>Hello</span> <b>cool</b> <span>world</span> <span>#42!</span></b>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜