How to control laziness on example of Double each element of list

2023-01-30 16:54 问答作者：

I tried to solve Duplicate each element in a Haskell List task and make it as a full program, that write the list to standard output

Here is my solution

main :: IO ()
main = getList >>= return . doubleEachItem >>= putStrLn . show

getList = return [1,3,5]

doubleEachItem :: [Int] -> [Int]
doubleEachItem = foldr (++) [] . map (take 2 . repeat)

But when I try to process really long list

getList = return . take 10000000000 $ repeat 15

the program terminated with out of memory error.

The question is: how I can improve the program, so it would be able to process lists of any size?

Edit:

I think program crashed because I run it with runghc command. In that case ghc consume about 3Gbytes of memory and was killed by operating system. The output file after crash was only about 0.6开发者_开发知识库GBytes. The reason of such behavior is unclear for me.

When I compile the program to native executable ghc haskell03.hs -o haskell03 and execute it with redirection to a file ./haskell03 >out03.txt it perfectly runs in constant memory space and produces output file with speed about 20MBytes/sec. When the program finished the output file takes 57GBytes. Total execution time was 47 minutes.

Oh! I think I know what's going on. This is subtle.

runghc runs in interpreted mode, as if you had loaded an run the file through ghci. getList is a top-level binding, so its value (and all values it references) are cached. That means that GHC was trying to cache your entire ten billion element array.

You might think that if you inlined the list it would work:

main = putStrLn . show . doubleEachElement $ [1..10^10]

But not so, because main is a top-level binding as well, and it references [1..10^10], so the giant unfolded version of that list will be cached there as well.

The general rule is, when something doesn't depend on the input, it can be cached. So you can fix this by making it appear to depend on the input. In interpreted mode, GHC is not very smart about this, so it's fairly easy to trick it:

main = do
    n <- return (10^10)
    putStrLn . show . doubleEachElement $ [1..n]

This would work with your getList as well, if getList took a parameter instead of hard-coding 10^10.

Fortunately, in compiled mode, GHC looks harder at the uses of these top-level symbols, so it can see that the list is only used once, and thus can begin garbage collecting its heads as it is output.

This problem does not come up in the real world very often, because programs often depend heavily on their input, so there aren't any top level big constants like this. But yeah, it can be a pain when GHC tries to cache something you don't want it to.

Your code is perfectly lazy, it doesn't need any optimization. What is probably going on is that all the output is on a single line, which is being buffered. You can fix this by turning off buffering:

import System.IO
main = do
    hSetBuffering stdout NoBuffering
    getList >>= ...

Or by, say, printing each number on its own line

main = getList >>= return . doubleEachItem >>= mapM_ print

This the application of answer of luqui to original program from the question with minimum modification

main :: IO ()
main = return 1 >>= getList >>= return . doubleEachElement >>= putStrLn . show

getList doNotCache = return . take (10^10) $ repeat 15

doubleEachElement :: [Int] -> [Int]
doubleEachElement = foldr (++) [] . map (take 2 . repeat)

The changes are emphasized with bold.

As luqui has shown, getList was cached because it was constant expression. But we can make haskell to believe that it is not constant simply by providing dummy argument, dependent on input.

Your doubleEachItem implementation via foldr require whole list constructed in memory, because foldr beginning work from end of list. To avoid this, replace foldr to foldl' or rewrite function as:

doubleEachItem = concatMap (replicate 2)

继续阅读：haskell lazy-evaluation

How to control laziness on example of Double each element of list

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？