How can unwanted tags be removed from HTML using Nokogiri?
I need to strip out all font tags from a document. When attempting to do so with the following Ruby code, other elements and text within the font tags are lost. I've also attempted to iterate through all children elements and make them siblings of the font tag before unlinking the font tag--which also results in lost HTML. What is a good method for removing tags which can contain other elements and/or text?
doc.css('font').each do |element|
element.unlink
end
UPDATE (in response to first solution):
The problem with using node.children to obtain the children and then move the children to the font node's parent node is that none of the children nodes include the text found within the font node. As soon as the font tag is removed (unlinked), all text within the font tag also disappears from the document.
My revised question is thus: how do I use Nokogiri to obtain the t开发者_StackOverflow社区ext of the font node and how can this text be moved to replace the font tag, in the font node's position.
I created a more generic solution based on the code in your comment:
module Filter
def remove_tags_preserve_content!(*list)
xpath('.//*').each do |element|
if list.include?(element.name)
element.children.reverse.each do |child|
child_clone = child.clone
element.add_next_sibling child_clone
child.unlink
end
element.unlink
end
end
end
end
class Nokogiri::XML::Element
include Filter
end
class Nokogiri::XML::NodeSet
include Filter
end
# === Example ===
doc.remove_tags_preserve_content!('font')
The problem is you're lopping off the node, which also trims the child nodes. You need to preserve the children then append them to the parent node. Once you've done that you can delete the target node.
Take a look at "Replace Node w/ Children" - http://rubyforge.org/pipermail/nokogiri-talk/2009-June/000333.html
In that message Aaron is talking about replacing XML nodes, but it's all the same once a HTML document has been parsed by Nokogiri. You'll need to do some minor tweaks but it should get you going.
精彩评论