Dealing with the surprising lack of ParList in scala.collections.parallel

2023-03-19 17:13 问答作者：

So scala 2.9 recen开发者_运维百科tly turned up in Debian testing, bringing the newfangled parallel collections with it.

Suppose I have some code equivalent to

  def expensiveFunction(x:Int):Int = {...}

  def process(s:List[Int]):List[Int} = s.map(expensiveFunction)

now from the teeny bit I'd gleaned about parallel collections before the docs actually turned up on my machine, I was expecting to parallelize this just by switching the List to a ParList... but to my surprise, there isn't one! (Just ParVector, ParMap, ParSet...).

As a workround, this (or a one-line equivalent) seems to work well enough:

  def process(s:List[Int]):List[Int} = {
    val ps=scala.collection.parallel.immutable.ParVector()++s
    val pr=ps.map(expensiveFunction)
    List()++pr
  }

yielding an approximately x3 performance improvement in my test code and achieving massively higher CPU usage (quad core plus hyperthreading i7). But it seems kind of clunky.

My question is a sort of an aggregated:

Why isn't there a ParList ?
Given there isn't a ParList, is there a better pattern/idiom I should adopt so that I don't feel like they're missing ?
Am I just "behind the times" using Lists a lot in my scala programs (like all the Scala books I bought back in the 2.7 days taught me) and I should actually be making more use of Vectors ? (I mean in C++ land I'd generally need a pretty good reason to use std::list over std::vector).

Lists are great when you want pattern matching (i.e. case x :: xs) and for efficient prepending/iteration. However, they are not so great when you want fast access-by-index, or splitting into chunks, or joining (i.e. xs ::: ys).

Hence it does not make much sense (to have a parallel List) when you think that this kind of thing (splitting and joining) is exactly what is needed for efficient parallelism. Use:

xs.toIndexedSeq.par

First, let me show you how to make a parallel version of that code:

def expensiveFunction(x:Int):Int = {...}

def process(s:List[Int]):Seq[Int] = s.par.map(expensiveFunction).seq

That will have Scala figure things out for you -- and, by the way, it uses ParVector. If you really want List, call .toList instead of .seq.

As for the questions:

There isn't a ParList because a List is an intrinsically non-parallel data structure, because any operation on it requires traversal.
You should code to traits instead of classes -- Seq, ParSeq and GenSeq, for example. Even performance characteristics of a List are guaranteed by LinearSeq.
All the books before Scala 2.8 did not have the new collections library in mind. In particular, the collections really didn't share a consistent and complete API. Now they do, and you'll gain much by taking advantage of it.

Furthermore, there wasn't a collection like Vector in Scala 2.7 -- an immutable collection with (near) constant indexed access.

A List cannot be easily split into various sub-lists which makes it hard to parallelise. For one, it has O(n) access; also a List cannot strip its tail, so one need to include a length parameter.

I guess, taking a Vector will be the better solution.

Note that Scala’s Vector is different from std::vector. The latter is basically a wrapper around standard array, a contiguous block in memory which needs to be copied every now and then when adding or removing data. Scala’s Vector is a specialised data structure which allows for efficient copying and splitting while keeping the data itself immutable.

继续阅读：list parallel-processing scala scala-collections

Dealing with the surprising lack of ParList in scala.collections.parallel

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？