Problems using foreach parallelization

2023-02-11 03:41 问答作者：

I'm trying to compare parallelization options. Specifically, I'm comparing the standard SNOW and mulitcore implementations to those using doSNOW or doMC and foreach. As a sample problem, I'm illustrating the central limit theorem by computing the means of samples drawn from a standard normal distribution many times. Here's the standard code:

CltSim <- function(nSims=1000, size=100, mu=0, sigma=1){
  sapply(1:nSims, function(x){
    mean(rnorm(n=size, mean=mu, sd=sigma))
  })
}

Here's the SNOW implementation:

library(snow)
cl <- makeCluster(2)

ParCltSim <- function(cluster, nSims=1000, size=100, mu=0, sigma=1){
  parSapply(cluster, 1:nSims, function(x){
    mean(rnorm(n=size, mean=mu, sd=sigma))
  })
}

Next, the doSNOW method:

library(foreach)
library(doSNOW)
registerDoSNOW(cl)

FECltSim <- function(nSims=1000, size=100, mu=0, sigma=1) {
  x <- numeric(nSims)
  foreach(i=1:nSims, .combine=cbind) %dopar% {
    x[i] <- mean(rnorm(n=size, mean=mu, sd=sigma))
  }
}

I get the following results:

> system.time(CltSim(nSims=10000, size=100))
   user  system elapsed 
  0.476   0.008   0.484 
> system.time(ParCltSim(cluster=cl, nSims=10000, size=100))
   user  system elapsed 
  0.028   0.004   0.375 
> system.time(FECltSim(nSims=10000, size=100))
   user  system elapsed 
  8.865   0.408  11.309

The SNOW implementation shaves off about 23% of computing time relative to an unparallelized run (time savings get bigger as the number of simulations increase, as we would expect). The foreach attempt actually increases run time by a factor of 20. Additionally, if I change %dopar% to %do% and check the unparallelized version of the loop, it takes over 7 seconds.

Additionally, we can consider the multicore package. The simulation written for multicore is

library(multicore)
MCCltSim <- function(nSims=1000, size=100, mu=0, sigma=1){
  unlist(mclapply(1:nSims, function(x){
    mean(rnorm(n=size, mean=mu, sd=sigma))
  }))
}

We get an even better speed improvement than SNOW:

> system.time(MCCltSim(nSims=10000, size=100))
   user  system elapsed 
  0.924   0.032   0.307

Starting a new R session, we can attempt the foreach implementation using doMC instead of doSNOW, calling

library(doMC)
registerDoMC()

then running FECltSim() as above, still finding

> system.time(FECltSim(nSims=10000, size=100))
   user  system elapsed 开发者_开发技巧
  6.800   0.024   6.887

This is "only" a 14-fold increase over the non-parallelized runtime.

Conclusion: My foreach code is not running efficiently under either doSNOW or doMC. Any idea why?

Thanks, Charlie

To follow on something Joris said, foreach() is best when the number of jobs does not hugely exceed the number of processors you will be using. Or more generally, when each job takes a significant amount of time on its own (seconds or minutes, say). There is a lot of overhead in creating the threads, so you really don't want to use it for lots of small jobs. If you were doing 10 million sims rather than 10 thousand, and you structured your code like this:

nSims = 1e7
nBatch = 1e6
foreach(i=1:(nSims/nBatch), .combine=c) %dopar% {
  replicate(nBatch, mean(rnorm(n=size, mean=mu, sd=sigma))
}

I bet you would find that foreach was doing pretty well.

Also note the use of replicate() for this kind of application rather than sapply. Actually, the foreach package has a similar convenience function, times(), which could be applied in this case. Of course, if your code is not doing a simple simulations with identical parameters every time, you will need sapply() and foreach().

To start with, you could write your foreach code a bit more concise :

FECltSim <- function(nSims=1000, size=100, mu=0, sigma=1) {
  foreach(i=1:nSims, .combine=c) %dopar% {
    mean(rnorm(n=size, mean=mu, sd=sigma))
  }
}

This gives you a vector, no need to explicitly make it within the loop. Also no need to use cbind, as your result is every time just a single number. So .combine=c will do

The thing with foreach is that it creates quite a lot of overhead to communicate between the cores and get the results of the different cores fit together. A quick look at the profile shows this pretty clearly :

$by.self
                         self.time self.pct total.time total.pct
$                             5.46    41.30       5.46     41.30
$<-                           0.76     5.75       0.76      5.75
.Call                         0.76     5.75       0.76      5.75
...

More than 40% of the time it is busy selecting things. It also uses a lot of other functions for the whole operation. Actually, foreach is only advisable if you have relatively few rounds through very time consuming functions.

The other two solutions are built on a different technology, and do far less in R. On a sidenode, snow is actually initially developed to work on clusters more than on single workstations, like multicore is.

继续阅读：foreach parallel-processing r

Problems using foreach parallelization

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？