How to parse html source code with ruby/nokogiri?
I've successfully used ruby (1.8) and nokogiri's css parsing to pull out front facing data from web pages.
However I now need to pull out some data from a series of pages where the data is in the "meta" tags in the source code of the page.
One of the lines I need is the following:
<meta name="geo.position" content="35.667459;139.706256" />
I've tried using xpath put haven't been able to get it right.
Any help as to what syntax is neede开发者_如何转开发d would be much appreciated.
Thanks
This is a good case for a CSS attribute selector. For example:
doc.css('meta[name="geo.position"]').each do |meta_tag|
puts meta_tag['content'] # => 35.667459;139.706256
end
The equivalent XPath expression is almost identical:
doc.xpath('//meta[@name = "geo.position"]').each do |meta_tag|
puts meta_tag['content'] # => 35.667459;139.706256
end
require 'nokogiri'
doc = Nokogiri::HTML('<meta name="geo.position" content="35.667459;139.706256" />')
doc.at('//meta[@name="geo.position"]')['content'] # => "35.667459;139.706256"
精彩评论