Making multiple HTTP requests asynchronously

2022-12-18 03:10 问答作者：

require 'net/开发者_JAVA技巧http'

urls = [
  {'link' => 'http://www.google.com/'},
  {'link' => 'http://www.yandex.ru/'},
  {'link' => 'http://www.baidu.com/'}
]

urls.each do |u|
  u['content'] = Net::HTTP.get( URI.parse(u['link']) )
end

print urls

This code works in synchronous style. First request, second, third. I would like to send all requests asynchronously and print urls after all of them is done.

What the best way to do it? Is Fiber suited for that?

I just saw this, a year and a bit later, but hopefully not too late for some googler...

Typhoeus by far the best solution for this. It wraps libcurl in a really elegant fashion. You can set the max_concurrency up to about 200 without it choking.

With respect to timeouts, if you pass Typhoeus a :timeout flag, it will just register a timeout as the response... and then you can even put the request back in another hydra to try again if you like.

Here's your program rewritten with Typhoeus. Hopefully this helps anybody who comes across this page later!

require 'typhoeus'

urls = [
  'http://www.google.com/',
  'http://www.yandex.ru/',
  'http://www.baidu.com/'
]

hydra = Typhoeus::Hydra.new

successes = 0

urls.each do |url|
    request = Typhoeus::Request.new(url, timeout: 15000)
    request.on_complete do |response|
        if response.success?
            puts "Successfully requested " + url
            successes += 1
        else
            puts "Failed to get " + url
        end
    end
    hydra.queue(request)
end

hydra.run 

puts "Fetched all urls!" if successes == urls.length

Here's an example using threads.

require 'net/http'

urls = [
  {'link' => 'http://www.google.com/'},
  {'link' => 'http://www.yandex.ru/'},
  {'link' => 'http://www.baidu.com/'}
]

urls.each do |u|
  Thread.new do
    u['content'] = Net::HTTP.get( URI.parse(u['link']) )
    puts "Successfully requested #{u['link']}"

    if urls.all? {|u| u.has_key?("content") }
      puts "Fetched all urls!"
      exit
    end
  end
end

sleep

I have written an in-depth blog post about this topic which includes an answer that is somewhat similar to the one August posted - but with a few key differences: 1) Keeps track of all thread references in "thread" array. 2) Uses "join" method to tie up threads at the end of program.

require 'net/http'

# create an array of sites we wish to visit concurrently.
urls = ['link1','link2','link3']  
# Create an array to keep track of threads.
threads = []

urls.each do |u|  
  # spawn a new thread for each url
  threads << Thread.new do
  Net::HTTP.get(URI.parse(u))
    # DO SOMETHING WITH URL CONTENTS HERE
    # ...
    puts "Request Complete: #{u}\n"
  end
end

# wait for threads to finish before ending program.
threads.each { |t| t.join }

puts "All Done!"

The full tutorial (and some performance information) is available here: https://zachalam.com/performing-multiple-http-requests-asynchronously-in-ruby/

With help of concurrent-ruby you can process data concurrently:

require 'net/http'
require 'concurrent-ruby'

class Browser
  include Concurrent::Async

  def render_page(link)
    sleep 5
    body = Net::HTTP.get( URI.parse(link) )
    File.open(filename(link), 'w') { |file| file.puts(body)}
  end

  private

  def filename(link)
    "#{link.gsub(/\W/, '-')}.html"
  end
end

pages = [
  'https://www.google.com',
  'https://www.bing.com',
  'https://www.baidu.com'
].map{ |link| Browser.new.async.render_page(link) }.map(&:value)

This can be done with the C library cURL. A ruby binding for that library exists, but it doesn't seem to support this functionality out of the box. However, it looks like there is a patch adding/fixing it (example code is available on the page). I know this doesn't sound great, but it might be worth a try if there aren't any better suggestions.

It depends what you want to do after the function afterwards. You can do it with simple threads:

see: http://snipplr.com/view/3966/simple-example-of-threading-in-ruby/

You could have a different thread execute each one of the Net::HTTP.get. And just wait for all the threads to finish.

BTW printing urls will print both the link and the content.

The work_queue gem is the easiest way to perform tasks asynchronously and concurrently in your application.

wq = WorkQueue.new 2 # Limit the maximum number of simultaneous worker threads

urls.each do |url|
  wq.enqueue_b do
    response = Net::HTTP.get_response(url)
    # use the response
  end
end

wq.join # All requests are complete after this

继续阅读：asynchronous concurrency fiber ruby

Making multiple HTTP requests asynchronously

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？