Nokogiri: How to retrieve the text from an XML::Element, excluding the text from its descendants?
Is there a more elegant way to write the following code?
开发者_运维技巧def get_text(element)
text_node = element.children.find &:text?
text_node.text if text_node
end
You can write
element.xpath('text()').to_s
which returns the raw text of text children of element
excluding any text in descendant nodes (whereas your code only return the first text child of element
).
Remember that the DOM is hierarchical so you need to remove the child nodes:
Starting with this:
require 'nokogiri'
xml = <<EOT
<xml>
<a>some text
<b>
<c>more text</c>
</b>
</a>
</xml>
EOT
doc = Nokogiri::XML(xml)
If you don't mind doing it destructively:
doc.at('b').remove
doc.text #=> "\n some text\n \n \n"
If you do mind:
a_node = Nokogiri::XML.fragment(doc.at('a').to_xml)
a_node.at('b').remove
a_node.text #=> "some text\n \n "
Strip the trailing carriage returns and you should be good to go.
of course this syntax will also help you
==================================
doc = Nokogiri::Slop <<-EOXML
<employees>
<employee status="active">
<fullname>Dean Martin</fullname>
</employee>
<employee status="inactive">
<fullname>Jerry Lewis</fullname>
</employee>
</employees>
EOXML
====================================
# navigate!
doc.employees.employee.last.fullname.content # => "Jerry Lewis"
fullname = @doc.xpath("//character")
puts fullname.text
精彩评论