Nokogiri: Merge neighbour text nodes recursively?
I have a prepared Nokogiri page where junk is removed... but still the text parts are store开发者_如何学God in different nodes...
What I want to do is connecting all direct neighbour text nodes into one single text node...
what I came up with:
#merge neighbour text nodes -> connect content
def merge_text_nodes(node)
previoustext = false
node.children.each_with_index do |item,i|
if item.name != 'text()'
merge_text_nodes(item)
previoustext = false
else
if previoustext
node.children[i-1].inner_html += item.inner_html
item.remove
end
previoustext = true
end
end
end
But it doesn't seem to work as expected - it seems to do nothing at all... Can someone tell me how to do it right/show me the error/the correct way to do it?
Okay, finally I got it right myself:
def merge_text_nodes(node)
prev_is_text = false
newnodes = []
node.children.each do |element|
if element.text?
if prev_is_text
newnodes[-1].content += element.text
else
newnodes << element
end
element.remove
prev_is_text = true
else
newnodes << merge_text_nodes(element)
element.remove
prev_is_text = false
end
end
node.children.remove
newnodes.each do |item|
node.add_child(item)
end
return node
end
An interesting solution to this problem might be the following:
xml # your Nokogiri XML object, with unmerged text nodes
xml = Nokogiri::XML(xml.to_xml)
Re-parsing the XML from a string causes adjacent text nodes to be merged as a side effect.
精彩评论