scala parallel collections degree of parallelism

2023-02-20 03:56 问答作者：

Is there any equivalent in scala parallel collections to LINQ's withDegreeOfParallelism which sets the number of threads which will run a query? I want to run an operation in parallel which needs to have开发者_运维问答 a set number of threads running.

With the newest trunk, using the JVM 1.6 or newer, use the:

collection.parallel.ForkJoinTasks.defaultForkJoinPool.setParallelism(parlevel: Int)

This may be a subject to changes in the future, though. A more unified approach to configuring all Scala task parallel APIs is planned for the next releases.

Note, however, that while this will determine the number of processors the query utilizes, this may not be the actual number of threads involved in running a query. Since parallel collections support nested parallelism, the actual thread pool implementation may allocate more threads to run the query if it detects this is necessary.

EDIT:

Starting from Scala 2.10, the preferred way to set the parallelism level is through setting the tasksupport field to a new TaskSupport object, as in the following example:

scala> import scala.collection.parallel._
import scala.collection.parallel._

scala> val pc = mutable.ParArray(1, 2, 3)
pc: scala.collection.parallel.mutable.ParArray[Int] = ParArray(1, 2, 3)

scala> pc.tasksupport = new ForkJoinTaskSupport(new scala.concurrent.forkjoin.ForkJoinPool(2))
pc.tasksupport: scala.collection.parallel.TaskSupport = scala.collection.parallel.ForkJoinTaskSupport@4a5d484a

scala> pc map { _ + 1 }
res0: scala.collection.parallel.mutable.ParArray[Int] = ParArray(2, 3, 4)

While instantiating the ForkJoinTaskSupport object with a fork join pool, the parallelism level of the fork join pool must be set to the desired value (2 in the example).

Independently of the JVM version, with Scala 2.9+ (introduced parallel collections), you can also use a combination of the grouped(Int) and par functions to execute parallel jobs on small chunks, like this:

scala> val c = 1 to 5
c: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5)

scala> c.grouped(2).seq.flatMap(_.par.map(_ * 2)).toList
res11: List[Int] = List(2, 4, 6, 8, 10)

grouped(2) creates chunks of length 2 or less, seq makes sure the collection of chunks is not parallel (useless in this example), then the _ * 2 function is executed on the small parallel chunks (created with par), thus insuring that at most 2 threads is executed in parallel.

This might be however slightly less efficient than setting the worker pool parameter, I'm not sure about that.

继续阅读：scala scala-collections

scala parallel collections degree of parallelism

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？