How to use Nokogiri's xpath and at_xpath methods
I'm learning how to use Nokogiri and few questions came to me based on this code:
require 'rubygems'
require 'mechanize'
post_agent = WWW::Mechanize.new
post_page = post_agent.get('http://www.vbulletin.org/forum/showthread.php?t=230708')
puts "\nabsolute path with tbody gives nil"
puts post_page.parser.xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div[2]').xpath('text()').to_s.strip.inspect
puts "\n.at_xpath gives an empty string"
puts post_page.parser.at_xpath("//div[@id='posts']/div/table/tr/td/div[2]").at_xpath('text()').to_s.strip.inspect
puts "\ntwo lines solution with .at_xpath gives an empty string"
rows = post_page.parser.xpath("//div[@id='posts']/div/table/tr/td/div[2]")
puts rows[0].at_xpath('text()').to_s.strip.inspect
puts
puts "two lines working code"
rows = post_page.parser.xpath("//div[@id='posts']/div/table/tr/td/div[2]")
puts rows[0].xpath('text()').to_s.strip
puts "\none line working code"
puts post_page.parser.xpath("//div[@id='posts']/div/table/tr/td/div[2]")[0].xpath('text()').to_s.strip
puts "\nanother one line code"
puts post_page.parser.at_xpath("//div[@id='posts']/div/table/tr/td/div[2]").xpath('text()').to_s.strip
puts "\none line code with full path"
puts post_page.parser.xpath("/html/body/div/div/div/div/div/table/tr/td/div[2]")[0].xpath('text()').to_s.strip
- Is it better to use
//
or/
in XPath? @AnthonyWJones says that "the use of an unprefixed//
" is not such a good idea. - I had to remove
tbody
from any working XPath otherwise I got anil
result. How is possible to remove an element from the XPath to get things to work? - Do I have to use
xpath
twice to extract data if not using a full XPath? - Why ca开发者_StackOverflow中文版n't I make
at_xpath
work to extract data? It works nicely in "How do I parse an HTML table with Nokogiri?". What is the difference?
//
means every node at every level so it's much more expensive compared to/
.- You can use
*
as a placeholder. - No, you can make an XPath query, get the element then call Nokogiri's
text
method on the node. - Sure you can. Have a look at "What is the absolutely cheapest way to select a child node in Nokogiri?" and my benchmark file. You will see an example of
at_xpath
.
I found you often use the text()
expression. This is not required using Nokogiri. You can retrieve the node then call the text
method on the node. It's much less expensive.
Also keep in mind Nokogiri supports CSS selectors. They can be easier if you are working with HTML pages.
精彩评论