Haskell Stack Overflow
I'm writing a genetic algorithm to generate the string "helloworld". But the evolve function creates a stack overflow when n is 10,000 or more.
module Genetics where
import Data.List (sortBy)
import Random (randomRIO)
import Control.Monad (foldM)
class Gene g where
-- How ideal is the gene from 0.0 to 1.0?
fitness :: g -> Float
-- How does a gene mutate?
mutate :: g -> IO g
-- How many species will be explored?
species :: [g] -> Int
orderFitness :: (Gene g) => [g] -> [g]
orderFitness = reverse . sortBy (\a b -> compare (fitness a) (fitness b))
compete :: (Gene g) => [g] -> IO [g]
compete pool = do
let s = species pool
variants <- (mapM (mapM mutate) . map (replicate s)) pool
let pool' = (map head . map orderFitness) variants
return pool'
evolve :: (Gene g) => Int -> [g] -> IO [g]
evolve 0 pool = return pool
evolve n pool = do
pool' <- compete pool
evolve (n - 1) pool'
With species pool = 8
, a pool of 8 genes replicates to 8 groups. Each group mutates, and the fittest of each group is selected for further evolution 开发者_开发知识库(back to 8 genes).
GitHub
If you're interested in performance, I'd use a fast random number generator, such as:
- The mersenne-random package
Secondly, compete
looks very suspicious, as it is entirely lazy, despite building some potentially large structures. Try rewriting it to be a bit stricter, using the deepseq hammer:
import Control.DeepSeq
compete :: (Gene g, NFData g) => [g] -> IO [g]
compete pool = do
let s = species pool
variants <- (mapM (mapM mutate) . map (replicate s)) pool
let pool' = (map head . map orderFitness) variants
pool' `deepseq` return pool'
None of this stuff needs to be in IO, though, (separate issue). Something like the Rand
monad may be more appropriate.
Thanks to Don's deepseq
suggestion, I was able to narrow the problem to mapM mutate
which made too many thunks. The new version has mutate'
, which uses seq
to prevent thunking.
module Genetics where
import Data.List (maximumBy)
import Random (randomRIO)
class Gene g where
-- How ideal is the gene from 0.0 to 1.0?
fitness :: g -> Float
-- How does a gene mutate?
mutate :: g -> IO g
-- How many species will be explored in each round?
species :: [g] -> Int
best :: (Gene g) => [g] -> g
best = maximumBy (\a b -> compare (fitness a) (fitness b))
-- Prevents stack overflow
mutate' :: (Gene g) => g -> IO g
mutate' gene = do
gene' <- mutate gene
gene' `seq` return gene'
drift :: (Gene g) => [[g]] -> IO [[g]]
drift = mapM (mapM mutate')
compete :: (Gene g) => [g] -> IO [g]
compete pool = do
let islands = map (replicate (species pool)) pool
islands' <- drift islands
let representatives = map best islands'
return representatives
evolve :: (Gene g) => Int -> [g] -> IO [g]
evolve 0 pool = return pool
evolve n pool = compete pool >>= evolve (n - 1)
GitHub
Instead of using (map head . map orderFitness)
where orderFitness
is a sortBy
you could use maximumBy
and a single map
. That doesn't save too much (since you are going from an O(n log n) to O(n) and might be getting another factor of two from eliminating the double map), but is at the least somewhat simpler and more efficient. You would also get rid of a call to reverse.
I doubt this fixes the problem without a deepseq
, but it should be an improvement none the less.
Edit: if the standard library and GHC were perfect, then head . sortBy
would produce identical code to maximumBy
and map head . map sortBy
would produce identical code to map (head . sortBy)
sadly neither of these things is likely to be true in practice. sortBy
will tend to do a bunch of extra memory allocation because it is a divide and conquer algorithm. Combining maps is an optimization you sometimes get, but should not count on.
More importantly, using maximumBy
is more declarative. It is easier to see what the code does and how long it will take. It should also be easier to take advantage of in optimizations, because we know what the goal is, not just how we get it.
精彩评论