开发者

How to parse html source code with ruby/nokogiri?

I've successfully used ruby (1.8) and nokogiri's css parsing to pull out front facing data from web pages.

However I now need to pull out some data from a series of pages where the data is in the "meta" tags in the source code of the page.

One of the lines I need is the following:

<meta name="geo.position" content="35.667459;139.706256" />

I've tried using xpath put haven't been able to get it right.

Any help as to what syntax is neede开发者_如何转开发d would be much appreciated.

Thanks


This is a good case for a CSS attribute selector. For example:

doc.css('meta[name="geo.position"]').each do |meta_tag|
  puts meta_tag['content'] # => 35.667459;139.706256
end

The equivalent XPath expression is almost identical:

doc.xpath('//meta[@name = "geo.position"]').each do |meta_tag|
  puts meta_tag['content'] # => 35.667459;139.706256
end


require 'nokogiri'

doc = Nokogiri::HTML('<meta name="geo.position" content="35.667459;139.706256" />')
doc.at('//meta[@name="geo.position"]')['content'] # => "35.667459;139.706256"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜