Strange Behaviour Using Scala Parallel Collections and setParallelism
I recently found out about Parallel Collection in Scala 2.9 and was excited to see that the degree of parallelism can be set using collection.parallel.ForkJoinTasks.defaultForkJoinPool.setParallelism.
However when I tried an experiment of adding two vectors of size one million each , I find
- Using parallel collection with parallelism set to 64 is as fast as sequential (Shown in results).
- Increasing setParallelism seems to increase performance in a non-linear way. I would have atleast expected monotonic behaviour (That is performance should not degrade if I increase parallelism)
Can some one explain why this is happening
object examplePar extends App{
val Rnd = new Random()
val numSims = 1
val x = for(j <- 1 to 1000000) yield Rnd.nextDouble()
val y = for(j <- 1 to 1000000) yield Rnd.nextDouble()
val parInt = List(1,2,4,8,16,32,64,128,256)
var avg:Double = 0.0
var currTime:Long = 0
for(j <- parInt){
collection.parallel.ForkJoinTasks.defaultForkJoinPool.setParallelism(j)
avg = 0.0
for (k <- 1 to numSims){
currTime = System.currentTimeMillis()
(x zip y).par.map(x => x._1 + x._2)
avg += (System.currentTimeMillis() - currTime)
}
println("Average Time to execute with Parallelism set to " + j.toString + " = "+ (avg/numSims).toString + "ms")
}
currTime = System.currentTimeMillis()
(x zip y).map(x => x._1 + x._2)
println("Time to execute using Sequential = " + (System.currentTimeMillis() - currTime).toString + "ms")
}
The results on running the example using Scala 2.9.1 and a four core processor is
Average Time to execute with Parallelism set to 1 = 1047.0ms
Average Time to execute with Parallelism set to 2 = 594.0ms
Average Time to execute with Parallelism set to 4 = 672.0ms
Average Time to execute with Parallelism set to 8 = 343.0ms
Average Time to execute with Parallelism set to 16 = 375.0ms
Average Time to execute with Parallelism set to 32 = 391.0ms
Average Time to execute with Parallelism set to 64 = 406.0ms
Average Time to execute with Parallelism set to 128 = 813.0ms
Average Time to execute with Parallelism set to 256 = 469.0ms
Time to execute using Sequential = 406ms
Though these results are for one run, they are consistent when averaged ov开发者_C百科er more runs
Parallelism does not come free. It requires extra cycles to split the problem into smaller chunks, organize everything, and synchronize the result.
You can picture this as calling all your friends to help you move, waiting for them to get there, helping you load the truck, then taking them out to lunch, and finally, getting on with your task.
In your test case you are adding two doubles, which is a trivial exercise and takes so little time that overhead from parallelization is greater than simply doing the task in a one thread.
Again, the analogy would be to call all your friends to help you move 3 suitcases. It would take you half a day to get rid of them, while you could finish by yourself in minutes.
To get any benefit from parallelization your task has to be complicated enough to warrant the extra overhead. Try doing some expensive calculations, for example a formula involving a mix of 5-10 trigonometric and logarithmic functions.
I would suggest studying and using the scala.testing.Benchmark
trait to benchmark snippets of code. You have to take JIT, GC and other things into account when benchmarking on the JVM - see this paper. In short, you have to do each of the runs in the separate JVM after doing a couple of warm-up runs.
Also, note that the (x zip y)
part does not occur in parallel, because x
and y
are not yet parallel - the zip
is done sequentially. Next, I would suggest turning x
and y
into arrays (toArray
) and then calling par
- this will ensure that the benchmark uses parallel arrays, rather than parallel vectors (which are slower for transformer methods such as zip
and map
).
精彩评论