开发者

ruby fetching url content is always empty

I am so frustrated trying to use Ruby to fetch a specific url content.

I've tried many different ways like open-uri, standard request none worked so far. I always get empty html. I also tried to use python to fetch the same url which always returned the correct html content. I am really not sure why... Please help a开发者_Python百科s I am newbiew to both Ruby and Python... I want to use Ruby (prefer the tidy syntax and human friendly function names, easier to install libs using gem and homebrew (on mac) than python easy_install) but I am now considering Python because it just works (yet still trying to get my head around 2.x and 3.x issue). I may be doing something really stupid but I think is very unlikely.

ruby 1.9.2p136 (2010-12-25 revision 30365) [i386-darwin10.6.0]

Implementation 1:

url = URI.parse('http//:www.stackoverflow.com/') req = Net::HTTP::Get.new(url.path)
res = Net::HTTP.start(url.host, url.port) {|http|   http.request(req) }    
puts res.body #empty

Implementation 2:

doc = Nokogiri::HTML(open("http//:www.stackoverflow.com/", "User-Agent" => "Safari"))
#empty
#I tried to use without user agent, without Nokogiri none worked.

Python Implementation which worked every time perfectly

f = urllib.urlopen("http//:www.stackoverflow.com/")
# Read from the object, storing the page's contents in 's'.
s = f.read()
f.close()

print s


If that is your exact code it is invalid for several reasons.

  1. http: should be http://
  2. URL needs a path. if you want the root page of example.com it needs to be http://example.com/ the trailing slash is significant.
  3. if you put 2 lines of code on one line you need to use ; to denote the end of the first line

SO

require 'net/http'

url = URI.parse('http://www.yellowpages.com.au/search/listings?clue=plumber&locationClue=Australia')
req = Net::HTTP::Get.new(url.path)
res = Net::HTTP.start(url.host, url.port) {|http|   http.request(req) }    
puts res.body

Same is true with using open in nokogiri

EDIT: that site is returning bad results many times:

counter = 0

20.times do
  url = URI.parse('http://www.yellowpages.com.au/search/listings?clue=plumber&locationClue=Australia')
  req = Net::HTTP::Get.new(url.path)
  res = Net::HTTP.start(url.host, url.port) {|http|   http.request(req) }    
  sleep 1
  counter +=1 unless res.body.empty?
end

puts counter

for me this only returned once a non empty body. If you substitute in another site it works all the time

curl "http://www.yellowpages.com.au/search/listings?clue=plumber&locationClue=Australia"

Yields the same inconsistent results.


Two examples with openURI (standard lib), a wrapper for (among others) the rather cumbersome Net::HTTP :

require 'open-uri'

open("http://www.stackoverflow.com/"){|f| puts f.read}

puts URI::parse("http://www.google.com/").read
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜