开发者

Why does this Nokogiri command strip out HTML tags?

This is a continuation of a previous question. I'm having problems with this Nokogiri snippet:

>> require 'nokogiri'
>> html = 'bad<p>markup</p>with<img src="foo.jpg">'
>> Nokogiri::HTML(html).at_css('body').children.map {|x| '<p>' + x.text + '</p>'}.join('') 
=> "<p>bad</p><p>markup</p><p>with</p><p></p>"

What happened to my image tag? It seems that Nokogiri might be stripping ALL the HTML tags present (including开发者_运维技巧 my original <p> around the word "markup"), and replacing them. How do I prevent this from happening? All I want to do is ensure that entirely untagged text is wrapped in a <p> tag...


Only wrap the element in a p tag if it is a text node, otherwise call to_html on it:

require 'nokogiri'

html = 'bad<p>markup</p>with<img src="foo.jpg">'

Nokogiri::HTML(html).at_css('body').
children.map do |x|
  if x.text?
    '<p>' + x.text + '</p>'
  else
    x.to_html
  end
end.join('') 
#=> "<p>bad</p>\n<p>markup</p><p>with</p><img src=\"foo.jpg\">"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜