Is there an implementation of rapid concurrent syntactical sugar in scala? eg. map-reduce
Passing messages around with actors is great. But I would like to have even easier code.
Examples (Pseudo-code)
val splicedList:List[List[Int]]=biglist.partition(100)
val sum:Int=ActorPool.numberOfActors(5).getAllResults(splicedList,foldLeft(_+_))
where spliceIntoParts turns one big list into开发者_如何学Go 100 small lists the numberofactors part, creates a pool which uses 5 actors and receives new jobs after a job is finished and getallresults uses a method on a list. all this done with messages passing in the background. where maybe getFirstResult, calculates the first result, and stops all other threads (like cracking a password)
With Scala Parallel collections that will be included in 2.8.1 you will be able to do things like this:
val spliced = myList.par // obtain a parallel version of your collection (all operations are parallel)
spliced.map(process _) // maps each entry into a corresponding entry using `process`
spliced.find(check _) // searches the collection until it finds an element for which
// `check` returns true, at which point the search stops, and the element is returned
and the code will automatically be done in parallel. Other methods found in the regular collections library are being parallelized as well.
Currently, 2.8.RC2 is very close (this or next week), and 2.8 final will come in a few weeks after, I guess. You will be able to try parallel collections if you use 2.8.1 nightlies.
You can use Scalaz's concurrency features to achieve what you want.
import scalaz._
import Scalaz._
import concurrent.strategy.Executor
import java.util.concurrent.Executors
implicit val s = Executor.strategy[Unit](Executors.newFixedThreadPool(5))
val splicedList = biglist.grouped(100).toList
val sum = splicedList.parMap(_.sum).map(_.sum).get
It would be pretty easy to make this prettier (i.e. write a function mapReduce that does the splitting and folding all in one). Also, parMap over a List is unnecessarily strict. You will want to start folding before the whole list is ready. More like:
val splicedList = biglist.grouped(100).toList
val sum = splicedList.map(promise(_.sum)).toStream.traverse(_.sum).get
You can do this with less overhead than creating actors by using futures:
import scala.actors.Futures._
val nums = (1 to 1000).grouped(100).toList
val parts = nums.map(n => future { n.reduceLeft(_ + _) })
val whole = (0 /: parts)(_ + _())
You have to handle decomposing the problem and writing the "future" block and recomposing it in to a final answer, but it does make executing a bunch of small code blocks in parallel easy to do.
(Note that the _()
in the fold left is the apply function of the future, which means, "Give me the answer you were computing in parallel!", and it blocks until the answer is available.)
A parallel collections library would automatically decompose the problem and recompose the answer for you (as with pmap
in Clojure); that's not part of the main API yet.
I'm not waiting for Scala 2.8.1 or 2.9, it would rather be better to write my own library or use another, so I did more googling and found this: akka http://doc.akkasource.org/actors
which has an object futures with methods
awaitAll(futures: List[Future]): Unit
awaitOne(futures: List[Future]): Future
but http://scalablesolutions.se/akka/api/akka-core-0.8.1/ has no documentation at all. That's bad.
But the good part is that akka's actors are leaner than scala's native ones
With all of these libraries (including scalaz) around, it would be really great if scala itself could eventually merge them officially
At Scala Days 2010, there was a very interesting talk by Aleksandar Prokopec (who is working on Scala at EPFL) about Parallel Collections. This will probably be in 2.8.1, but you may have to wait a little longer. I'll lsee if I can get the presentation itself. to link here.
The idea is to have a collections framework which parallelizes the processing of the collections by doing exactly as you suggest, but transparently to the user. All you theoretically have to do is change the import from scala.collections to scala.parallel.collections. You obviously still have to do the work to see if what you're doing can actually be parallelized.
精彩评论