Haskell Stack Overflow

2023-03-05 10:09 问答作者：

I'm writing a genetic algorithm to generate the string "helloworld". But the evolve function creates a stack overflow when n is 10,000 or more.

module Genetics where

import Data.List (sortBy)
import Random (randomRIO)
import Control.Monad (foldM)

class Gene g where
    -- How ideal is the gene from 0.0 to 1.0?
    fitness :: g -> Float

    -- How does a gene mutate?
    mutate :: g -> IO g

    -- How many species will be explored?
    species :: [g] -> Int

orderFitness :: (Gene g) => [g] -> [g]
orderFitness = reverse . sortBy (\a b -> compare (fitness a) (fitness b))

compete :: (Gene g) => [g] -> IO [g]
compete pool = do
    let s = species pool
    variants <- (mapM (mapM mutate) . map (replicate s)) pool
    let pool' = (map head . map orderFitness) variants
    return pool'

evolve :: (Gene g) => Int -> [g] -> IO [g]
evolve 0 pool = return pool
evolve n pool = do
    pool' <- compete pool
    evolve (n - 1) pool'

With species pool = 8, a pool of 8 genes replicates to 8 groups. Each group mutates, and the fittest of each group is selected for further evolution 开发者_开发知识库(back to 8 genes).

GitHub

If you're interested in performance, I'd use a fast random number generator, such as:

The mersenne-random package

Secondly, compete looks very suspicious, as it is entirely lazy, despite building some potentially large structures. Try rewriting it to be a bit stricter, using the deepseq hammer:

import Control.DeepSeq    

compete :: (Gene g, NFData g) => [g] -> IO [g]
compete pool = do
    let s = species pool
    variants <- (mapM (mapM mutate) . map (replicate s)) pool
    let pool' = (map head . map orderFitness) variants
    pool' `deepseq` return pool'

None of this stuff needs to be in IO, though, (separate issue). Something like the Rand monad may be more appropriate.

Thanks to Don's deepseq suggestion, I was able to narrow the problem to mapM mutate which made too many thunks. The new version has mutate', which uses seq to prevent thunking.

module Genetics where

import Data.List (maximumBy)
import Random (randomRIO)

class Gene g where
    -- How ideal is the gene from 0.0 to 1.0?
    fitness :: g -> Float

    -- How does a gene mutate?
    mutate :: g -> IO g

    -- How many species will be explored in each round?
    species :: [g] -> Int

best :: (Gene g) => [g] -> g
best = maximumBy (\a b -> compare (fitness a) (fitness b))

-- Prevents stack overflow
mutate' :: (Gene g) => g -> IO g
mutate' gene = do
    gene' <- mutate gene
    gene' `seq` return gene'

drift :: (Gene g) => [[g]] -> IO [[g]]
drift = mapM (mapM mutate')

compete :: (Gene g) => [g] -> IO [g]
compete pool = do
    let islands = map (replicate (species pool)) pool
    islands' <- drift islands
    let representatives = map best islands'
    return representatives

evolve :: (Gene g) => Int -> [g] -> IO [g]
evolve 0 pool = return pool
evolve n pool = compete pool >>= evolve (n - 1)

GitHub

Instead of using (map head . map orderFitness) where orderFitness is a sortBy you could use maximumBy and a single map. That doesn't save too much (since you are going from an O(n log n) to O(n) and might be getting another factor of two from eliminating the double map), but is at the least somewhat simpler and more efficient. You would also get rid of a call to reverse.

I doubt this fixes the problem without a deepseq, but it should be an improvement none the less.

Edit: if the standard library and GHC were perfect, then head . sortBy would produce identical code to maximumBy and map head . map sortBy would produce identical code to map (head . sortBy) sadly neither of these things is likely to be true in practice. sortBy will tend to do a bunch of extra memory allocation because it is a divide and conquer algorithm. Combining maps is an optimization you sometimes get, but should not count on.

More importantly, using maximumBy is more declarative. It is easier to see what the code does and how long it will take. It should also be easier to take advantage of in optimizations, because we know what the goal is, not just how we get it.

继续阅读：genetic-algorithm haskell memory random stack-overflow

Haskell Stack Overflow

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？