ruby fetching url content is always empty
I am so frustrated trying to use Ruby to fetch a specific url content.
I've tried many different ways like open-uri, standard request none worked so far. I always get empty html. I also tried to use python to fetch the same url which always returned the correct html content. I am really not sure why... Please help a开发者_Python百科s I am newbiew to both Ruby and Python... I want to use Ruby (prefer the tidy syntax and human friendly function names, easier to install libs using gem and homebrew (on mac) than python easy_install) but I am now considering Python because it just works (yet still trying to get my head around 2.x and 3.x issue). I may be doing something really stupid but I think is very unlikely.
ruby 1.9.2p136 (2010-12-25 revision 30365) [i386-darwin10.6.0]
Implementation 1:
url = URI.parse('http//:www.stackoverflow.com/') req = Net::HTTP::Get.new(url.path)
res = Net::HTTP.start(url.host, url.port) {|http| http.request(req) }
puts res.body #empty
Implementation 2:
doc = Nokogiri::HTML(open("http//:www.stackoverflow.com/", "User-Agent" => "Safari"))
#empty
#I tried to use without user agent, without Nokogiri none worked.
Python Implementation which worked every time perfectly
f = urllib.urlopen("http//:www.stackoverflow.com/")
# Read from the object, storing the page's contents in 's'.
s = f.read()
f.close()
print s
If that is your exact code it is invalid for several reasons.
- http: should be http://
- URL needs a path. if you want the root page of example.com it needs to be http://example.com/ the trailing slash is significant.
- if you put 2 lines of code on one line you need to use ; to denote the end of the first line
SO
require 'net/http'
url = URI.parse('http://www.yellowpages.com.au/search/listings?clue=plumber&locationClue=Australia')
req = Net::HTTP::Get.new(url.path)
res = Net::HTTP.start(url.host, url.port) {|http| http.request(req) }
puts res.body
Same is true with using open in nokogiri
EDIT: that site is returning bad results many times:
counter = 0
20.times do
url = URI.parse('http://www.yellowpages.com.au/search/listings?clue=plumber&locationClue=Australia')
req = Net::HTTP::Get.new(url.path)
res = Net::HTTP.start(url.host, url.port) {|http| http.request(req) }
sleep 1
counter +=1 unless res.body.empty?
end
puts counter
for me this only returned once a non empty body. If you substitute in another site it works all the time
curl "http://www.yellowpages.com.au/search/listings?clue=plumber&locationClue=Australia"
Yields the same inconsistent results.
Two examples with openURI (standard lib), a wrapper for (among others) the rather cumbersome Net::HTTP :
require 'open-uri'
open("http://www.stackoverflow.com/"){|f| puts f.read}
puts URI::parse("http://www.google.com/").read
精彩评论