开发者

Nokogiri: Merge neighbour text nodes recursively?

I have a prepared Nokogiri page where junk is removed... but still the text parts are store开发者_如何学God in different nodes...

What I want to do is connecting all direct neighbour text nodes into one single text node...

what I came up with:

#merge neighbour text nodes -> connect content
def merge_text_nodes(node)
  previoustext = false
  node.children.each_with_index do |item,i|
    if item.name != 'text()'
      merge_text_nodes(item)
      previoustext = false
    else
      if previoustext
        node.children[i-1].inner_html += item.inner_html
        item.remove
      end
      previoustext = true
    end
  end
end

But it doesn't seem to work as expected - it seems to do nothing at all... Can someone tell me how to do it right/show me the error/the correct way to do it?


Okay, finally I got it right myself:

def merge_text_nodes(node)
  prev_is_text = false

  newnodes = []
  node.children.each do |element|
    if element.text?
      if prev_is_text
        newnodes[-1].content += element.text
      else
        newnodes << element
      end
      element.remove
      prev_is_text = true
    else
      newnodes << merge_text_nodes(element)
      element.remove
      prev_is_text = false
    end
  end

  node.children.remove
  newnodes.each do |item|
    node.add_child(item)
  end

  return node
end


An interesting solution to this problem might be the following:

xml # your Nokogiri XML object, with unmerged text nodes
xml = Nokogiri::XML(xml.to_xml)

Re-parsing the XML from a string causes adjacent text nodes to be merged as a side effect.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜