how to keep blank result for Nokogiri's NodeSet.search method
i want to run the search method of Nokogiri::XML::NodeSet based on one NodeSet called nodeset for some xpath rule like below:
nodeset.search(rule)
the above code returns a NodeSet, but that doesn't contain the ones which can not match the rule. My intention is t开发者_JAVA技巧hat: if element in nodeset is matched the rule, ok please return the matched result; if not matched please return a blank string in the result, so that i can know which element in caller nodeset is matched, which element in caller nodeset is not matched.
Could someone tell me how to do it? i will appreciate your help very much.
Nokogiri NodeSet
support set operations similar to Ruby arrays. Instead of keeping blanks in your matched set, find out the missed items after the fact:
require 'nokogiri'
doc = Nokogiri::XML <<-ENDXML
<root>
<a id="a1" class="foo">
<a id="a1a" class="foo" />
<a id="a1b" class="foo" andalso="this" />
</a>
<a id="a2" class="foo" andalso="this">
<a id="a2a" class="bar" />
<a id="a2b" class="bar" andalso="this" />
</a>
<a id="a3" class="foo" andalso="this" />
</root>
ENDXML
foos = doc.xpath('//a[@class="foo"]')
p foos.map{ |e| e['id'] }
#=> ["a1", "a1a", "a1b", "a2"]
subselect = foos.xpath('self::*[@andalso="this"]')
p subselect.map{ |e| e['id'] }
#=> ["a1b", "a2", "a3"]
missed = foos - subselect
p missed.map{ |e| e['id'] }
#=> ["a1", "a1a"]
If you really want non-nodes in the result, you'll have to use #map
instead of #search
or other Nokogiri methods and get an Array
instead of a NodeSet
:
subselect = foos.map do |el|
if el['andalso']=='this'
el
else
""
end
end
p subselect.map{ |e| e=="" ? "" : e['id'] }
#=> ["", "", "a1b", "a2", "a3"]
I don't know nokogiri well enough to know how well this will work but I suspect the following example may suggest a way forward. The following assumes that NodeSet behaves like a ruby array which it does according to its API docs [1]
a = (0..9).to_a
=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
evens = a.select { |i| i % 2 == 0 }
=> [0, 2, 4, 6, 8]
odds = a - evens
=> [1, 3, 5, 7, 9]
I believe you should be able to do something similar with your nodeset so that when your search has been performed, you can find the non-matched nodes by subtracting the new nodeset from the original one.
[1] http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/NodeSet.html#M000448
Here's how I'd go about it:
require 'nokogiri'
xml = <<EOT
<xml>
<find_node>foo</find_node>
<ignore_node>bar</ignore_node>
<find_node>foo</find_node>
<ignore_node>bar</ignore_node>
</xml>
EOT
# parse the document...
doc = Nokogiri::XML(xml)
# find the nodes we want...
desired_nodes = doc.search('//find_node')
# see if it's working...
desired_nodes.map{ |n| n.to_xml } # => ["<find_node>foo</find_node>", "<find_node>foo</find_node>"]
# walk the tree, grabbing the text or '' depending on whether the node is a hit or a miss...
node_result = doc.search('/xml/*').map{ |n| desired_nodes.include?(n) ? n.text : '' }
# ** here's the result **
node_result # => ["foo", "", "foo", ""]
# if we wanted to we could grab the desired_nodes' text...
desired_nodes.map{ |n| n.text } # => ["foo", "foo"]
# or find the ignored nodes...
ignored_nodes = doc.search('/xml/*') - desired_nodes
ignored_nodes.map{ |n| n.to_xml } # => ["<ignore_node>bar</ignore_node>", "<ignore_node>bar</ignore_node>"]
# ...and grab the ignored_nodes' text...
ignored_nodes.map{ |n| n.text } # => ["bar", "bar"]
精彩评论