Stack overflow in OCaml and F# but not in Haskell

2022-12-21 05:47 问答作者：

I've been comparing for fun different languages for speed in execution of the following program: for i from 1 to 1000000 sum the product i*(sqrt i)

One of my implementations (not the only one) is constructing a list [1..1000000] and then folding with a specific function.

The program works fine and fast in Haskell (even when using foldl and not foldl') but stack overflows in OCaml and F#.

Here开发者_StackOverflow中文版 is the Haskell code:

test = foldl (\ a b -> a + b * (sqrt b)) 0

create 0 = []
create n = n:(create (n-1))

main = print (test (create 1000000))

And here is the OCaml one:

let test = List.fold_left (fun a b -> a +. (float_of_int b) *. (sqrt (float_of_int b))) 0. ;;

let rec create = function
    | 0 -> []
    | n -> n::(create (n-1)) ;;

print_float (test (create 1000000));;

Why does the OCaml/F# implementation stack overflows?

The Haskell code for create evaluates the list lazily, i.e. as elements are needed by foldl. The entire list is not needed all at once.

By contrast, F# and OCaml evaluate the create list strictly, i.e. the code tries to generate the entire list in one go before passing it to fold_left.

One possibility in F# is to use a seq expression in the create function: this generates the list lazily in the same way as Haskell. (I'm not sure if OCaml has an equivalent feature.)

Firstly, if you're doing performance comparisons for numeric stuff, lists aren't the best choice. Try a package like the vector package for fast arrays.

And note that you can do even better in Haskell, thanks to loop fusion. By writing the create function as an enumeration the compiler can combine the create step, and the fold loop, into a single loop that allocates no intermediate data structures. The ability to do general fusion like this is unique to GHC Haskell.

I'll use the vector library (stream-based loop fusion):

import qualified Data.Vector as V

test = V.foldl (\ a b -> a + b * sqrt b) 0

create n = (V.enumFromTo 1 n)

main = print (test (create 1000000))

Now, before, with your code, the compiler is unable to remove all lists, and we end up with an inner loop like:

$wlgo :: Double# -> [Double] -> Double#

$wlgo =
  \ (ww_sww :: Double#) (w_swy :: [Double]) ->
    case w_swy of _ {
      [] -> ww_sww;
      : x_aoY xs_aoZ ->
        case x_aoY of _ { D# x1_aql ->
        $wlgo
          (+##
             ww_sww (*## x1_aql (sqrtDouble# x1_aql)))
          xs_aoZ
        }
    }

$wcreate :: Double# -> [Double]

$wcreate =
  \ (ww_swp :: Double#) ->
    case ==## ww_swp 0.0 of _ {
      False ->
        :
          @ Double
          (D# ww_swp)
          ($wcreate (-## ww_swp 1.0));
      True -> [] @ Double
    }

Note how there are two loops: create generating a (lazy) list, and the fold consuming it. Thanks to laziness, the cost of that list is cheap, so it runs in a respectable:

$ time ./C
4.000004999999896e14
./C  0.06s user 0.00s system 98% cpu 0.058 total

Under fusion, however, we get instead a single loop only!

main_$s$wfoldlM_loop :: Double# -> Double# -> Double#

main_$s$wfoldlM_loop =
  \ (sc_sYc :: Double#) (sc1_sYd :: Double#) ->
    case <=## sc_sYc 1000000.5 of _ {
      False -> sc1_sYd;
      True ->
        main_$s$wfoldlM_loop
          (+## sc_sYc 1.0)
          (+##
             sc1_sYd (*## sc_sYc (sqrtDouble# sc_sYc)))

GHC reduced our create and test steps into a single loop with no lists used. Just 2 doubles in registers. And with half as many loops, it runs nearly twice as fast:

$ ghc D.hs -Odph -fvia-C -optc-O3 -optc-march=native -fexcess-precision --make
$ time ./D
4.000005000001039e14
./D  0.04s user 0.00s system 95% cpu 0.038 total

This is a nice example of the power that guarantees of purity provide -- the compiler can be very aggressive at reording your code.

Another way to fix the code so that it works in F# is to use sequence expressions that are also generated lazily and in a way that doesn't cause StackOverflow (this won't work in OCaml, because sequence expressions are F# specific). Here is the code:

let test nums = Seq.fold (fun a b -> 
  a + (float b) * (sqrt (float b))) 0.0 nums

let rec create count = seq {
  match count with
  | 0 -> do ()
  | n -> yield n
         yield! create (n-1) }

printf "%f" (test (create 1000000));;

I'm not sure about the performance, however the compiler definitely does an optimization in the create function, so that iterating over the generated sequence should be reasonably fast. However, the aim of sequence expressions is mainly readability (since F# is not pure, it gives you a lot of space for manual optimizations if you really need them as opposed to Haskell which needs to optimize the pure code you write). Anyway, if you have some tests, you can give it a try!

In this form your create function is not tail-recursive. You can rewrite it in a tail recursive form that doesn't cause a stack overflow:

let rec create n l =
    match n with 
    | 0 -> l
    | n -> create (n-1) (n::l)

print_float (test (create 1000000 []));;

The program works fine and fast in Haskell (even when using foldl and not foldl') but stack overflows in OCaml and F#.

Your Haskell is using lazy lists but your OCaml/F# is using strict lists, so your programs are incomparable.

FWIW, a solution using on-demand sequences in F# is:

Seq.sumBy (fun i -> let i = float i in i * sqrt i) (seq {1..1000000})

继续阅读：f#haskell lazy-evaluation ocaml stack-overflow

Stack overflow in OCaml and F# but not in Haskell

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？