Strange Behaviour Using Scala Parallel Collections and setParallelism

2023-04-01 19:11 问答作者：

I recently found out about Parallel Collection in Scala 2.9 and was excited to see that the degree of parallelism can be set using collection.parallel.ForkJoinTasks.defaultForkJoinPool.setParallelism.

However when I tried an experiment of adding two vectors of size one million each , I find

Using parallel collection with parallelism set to 64 is as fast as sequential (Shown in results).
Increasing setParallelism seems to increase performance in a non-linear way. I would have atleast expected monotonic behaviour (That is performance should not degrade if I increase parallelism)

Can some one explain why this is happening

  object examplePar extends App{    

  val Rnd = new Random()
  val numSims = 1

  val x = for(j <- 1 to 1000000) yield Rnd.nextDouble()
  val y = for(j <- 1 to 1000000) yield Rnd.nextDouble()

  val parInt = List(1,2,4,8,16,32,64,128,256)    
  var avg:Double = 0.0
  var currTime:Long = 0

  for(j <- parInt){
    collection.parallel.ForkJoinTasks.defaultForkJoinPool.setParallelism(j)
    avg = 0.0
    for (k <- 1 to numSims){
      currTime = System.currentTimeMillis()    
      (x zip y).par.map(x => x._1 + x._2)
      avg += (System.currentTimeMillis() - currTime)
    }
    println("Average Time to execute with Parallelism set to " + j.toString + " = "+ (avg/numSims).toString + "ms")    
  }

  currTime = System.currentTimeMillis()
  (x zip y).map(x => x._1 + x._2)
  println("Time to execute using Sequential = " + (System.currentTimeMillis() - currTime).toString + "ms")        
}

The results on running the example using Scala 2.9.1 and a four core processor is

Average Time to execute with Parallelism set to 1 = 1047.0ms
Average Time to execute with Parallelism set to 2 = 594.0ms
Average Time to execute with Parallelism set to 4 = 672.0ms
Average Time to execute with Parallelism set to 8 = 343.0ms
Average Time to execute with Parallelism set to 16 = 375.0ms
Average Time to execute with Parallelism set to 32 = 391.0ms
Average Time to execute with Parallelism set to 64 = 406.0ms
Average Time to execute with Parallelism set to 128 = 813.0ms
Average Time to execute with Parallelism set to 256 = 469.0ms
Time to execute using Sequential = 406ms

Though these results are for one run, they are consistent when averaged ov开发者_C百科er more runs

Parallelism does not come free. It requires extra cycles to split the problem into smaller chunks, organize everything, and synchronize the result.

You can picture this as calling all your friends to help you move, waiting for them to get there, helping you load the truck, then taking them out to lunch, and finally, getting on with your task.

In your test case you are adding two doubles, which is a trivial exercise and takes so little time that overhead from parallelization is greater than simply doing the task in a one thread.

Again, the analogy would be to call all your friends to help you move 3 suitcases. It would take you half a day to get rid of them, while you could finish by yourself in minutes.

To get any benefit from parallelization your task has to be complicated enough to warrant the extra overhead. Try doing some expensive calculations, for example a formula involving a mix of 5-10 trigonometric and logarithmic functions.

I would suggest studying and using the scala.testing.Benchmark trait to benchmark snippets of code. You have to take JIT, GC and other things into account when benchmarking on the JVM - see this paper. In short, you have to do each of the runs in the separate JVM after doing a couple of warm-up runs.

Also, note that the (x zip y) part does not occur in parallel, because x and y are not yet parallel - the zip is done sequentially. Next, I would suggest turning x and y into arrays (toArray) and then calling par - this will ensure that the benchmark uses parallel arrays, rather than parallel vectors (which are slower for transformer methods such as zip and map).

继续阅读：parallel-processing scala

Strange Behaviour Using Scala Parallel Collections and setParallelism

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？