Does Ruby have any construct similar to Clojure's pmap for parallel processing?
I'm trying to deci开发者_StackOverflow中文版de whether to implement an application in Ruby or Clojure. Two of the requirements involve parallel processing:
The app must make parallel calls to fetch XML feeds and other types of data over the internet. Many such calls are made, and serializing the calls is inefficient.
The responses to those calls ought ideally to be processed in parallel. Processing mainly means transforming raw XML down to a much smaller piece of structured data (a Ruby hash or Clojure map) and inserting that into a MySQL database or CouchDB database.
I know Ruby a lot better than Clojure but if this is the right sort of project for Clojure I am all for using it.
Clojure's pmap
function seems ideal for these two requirements. I'm wondering if some Ruby library or feature had a similarly clean and easy way of doing parallel processing tasks like the above.
Making the pmap
function reusable is similarly simple:
module Enumerable
def pmap
map {|x| Thread.start {yield x}}.map {|t| t.join.value}
end
end
But, of course, using a proper thread pool / executor would probably be a good idea. Here’s an example.
Here's a simple little example of one way to do this. Note that there's nothing limiting the number of threads it creates at once, so you might want to create some sort of thread pool if you're running lots of threads.
[1,2,3].map{|x| Thread.start{x+1}}.map{|t| t.join.value}
I think the choice of implementation language depends on your application.
If you are network bound, Ruby should work fine. You might find it easier to implement concurrent requests using a reactor pattern with EventMachine. You can make HTTP requests using the EventMachine::Protocols::HttpClient class.
EventMachine.run {
http = EventMachine::Protocols::HttpClient.request(
:host => server,
:port => 80,
:request => "/index.xml"
)
http.callback {|response|
# process response
}
}
This way you do not need to worry about concurrency and all of the associated complexity, but you will have high throughput since you can make a large number of concurrent requests.
If you are CPU bound this won't work. If you are spending most of your time processing the XML feeds and not waiting on I/O to fetch the feed or insert into the database then you will have to run Ruby on JRuby or run multiple Ruby processes to achieve good multi-core utilization.
In the CPU bound case I would use Clojure, since if you are really CPU bound then doing the processing in Clojure will be easier to make parallel and just plain faster anyways.
精彩评论