How do I wrap untagged text elements using Nokogiri?
For example I have a html string:
<span class="no">1172</span><span class=开发者_运维问答"r">case</span> primary_key_prefix_type
How to wrap every element which doesn't have tag by Nokogiri like this:
<span class="no">1172</span><span class="r">case</span> <span>primary_key_prefix_type</span>
This doesn't feel like the most elegant solution, but it works:
require 'nokogiri'
# Given a node, find each whitespace-delimited word
# and wrap it in the supplied markup
def wrap_text( node, wrapper='<span />' )
wrapper = Nokogiri::XML::DocumentFragment.parse(wrapper).children.first
node.xpath('child::text()').each do |text_node|
text_node.swap( text_node.text.gsub(/(\s*)(\S+)(\s*)/) do
"#{$1}#{
wrapper.clone.tap{ |w| w.inner_html = $2 }.to_html
}#{$3}"
end )
end
node
end
# Testing
html = Nokogiri::HTML '<body>
<p><span class="no">1172</span><span class="r">case</span> primary_key_prefix_type</p>
<p>Hello <b>cool</b> world #42!</p>
</body>'
html.search('p').each{ |para| wrap_text(para) }
puts html.at('body')
#=> <body>
#=> <p><span class="no">1172</span><span class="r">case</span> <span>primary_key_prefix_type</span></p>
#=> <p><span>Hello</span> <b>cool</b> <span>world</span> <span>#42!</span></p>
#=> </body>
Edit: More examples:
# If your lines don't have element wrapping them...
raw = [
'<span class="no">1172</span><span class="r">case</span> primary_key',
'Hello <b>cool</b> world #42!'
]
puts raw.map{ |line| wrap_text(Nokogiri::HTML(line).at('body')).inner_html }
#=> <span class="no">1172</span><span class="r">case</span> <span>primary_key</span>
#=> <p>Hello <b>cool</b> world #42!</p>
# If your lines each have exactly one element wrapping them...
wrapped = [
'<a><span class="no">1172</span><span class="r">case</span> primary_key</a>',
'<b>Hello <b>cool</b> world #42!</b>'
]
body = Nokogiri::HTML(wrapped.join("\n")).at('body')
puts body.children.map{ |e| wrap_text(e) }
#=> <a><span class="no">1172</span><span class="r">case</span> <span>primary_key</span></a>
#=> <b><span>Hello</span> <b>cool</b> <span>world</span> <span>#42!</span></b>
精彩评论