Haskell -- parallel map that makes less sparks
I want to write a parallel map function in Haskell that's as efficient as possible. My initial attempt, which seems to be currently best, is to simply write,
pmap :: (a -> b) -> [a] -> [b]
pmap f = runEval . parList rseq . map f
I'm not seeing perfect CPU division, however. If this is possibly related to the number of sparks, could I write a pmap that divides the list into # of cpus segments, so there are minimal sparks created? I tried the following, but the peformance (and number of sparks) is much worse,
pmap :: (a -> b) -> [a] -> [b]
pmap f xs = concat $ runEval $ parList rseq $ map (map f) (chunk xs) where
-- the (len / 4) argument r开发者_如何转开发epresents the size of the sublists
chunk xs = chunk' ((length xs) `div` 4) xs
chunk' n xs | length xs <= n = [xs]
| otherwise = take n xs : chunk (drop n xs)
The worse performance may be correlated with the higher memory use. The original pmap does scale somewhat on 24-core systems, so it's not that I don't have enough data. (The number of CPU's on my desktop is 4, so I just hardcoded that).
Edit 1
Some performance data using +RTS -H512m -N -sstderr -RTS
is here:
- my 4-core desktop
- 24-core server
The parallel package defines a number of parallel map strategies for you:
parMap :: Strategy b -> (a -> b) -> [a] -> [b]
A combination of parList and map, and specific support for chunking the list:
parListChunk :: Int -> Strategy a -> Strategy [a]
Divides a list into chunks, and applies the strategy evalList strat
to each chunk in parallel.
You should be able to use a combination of these to get any sparking behavior you desire. Or, for even more control, the Par
monad package, for controlling the amount of threads created (purely).
References: The haddock docs for the parallel package
精彩评论