Haskell - efficient equivalent of for loop?
I've been doing some experiments and here's something I found. Consider the following C program:
int main()
{
for(int i = 0;
i < 1000000;
++i)
{}
}
and the following Haskell program:
import System.IO
loop :: Int -> IO ()
loop n = if 0 == n then return () else loop (n-1)
main = loop 1000000
Here is the output of time
for the above C program:
real 0m0.003s
user 0m0.000s
sys 0m0.000s
...and for the Haskell program:
real 0m0.028s
user 0m0.027s
sys 0m0.000s
At first I thought that gcc detected an empty loop and o开发者_如何学运维ptimized it out, but after increasing the number of iterations, the running time of the program also increased. Here are the outputs of time
for both of the programs, with the number of iterations set to 10000000:
C version
real 0m0.024s
user 0m0.023s
sys 0m0.000s
Haskell version
real 0m0.245s
user 0m0.247s
sys 0m0.000s
As you can see, the Haskell program is 10 times slower.
Question is: what is the efficient alternative to the for
loop in Haskell? As we just saw, simple recursion slows the program down about 10 times (and this is probably with the tail recursion optimizations done).
Firstly, you'd translate your C code to this,
main = go 0
where
go :: Int -> IO ()
go i | i < 1000000 = go (i+1)
| otherwise = return ()
which ghc optimizes to the empty program. It moves the final value into a register, compares against it, and returns ()
:
Main_zdwa_info:
cmpq $1000000, %r14 # imm = 0xF4240
movl $1000000, %eax # imm = 0xF4240
cmovlq %rax, %r14
movq (%rbp), %rax
movl $ghczmprim_GHCziUnit_Z0T_closure+1, %ebx
jmpq *%rax # TAILCALL
which when run:
$ time ./A
./A 0.00s user 0.00s system 88% cpu 0.004 total
takes no time.
In general, though, GHC emits equivalent loops to e.g. GCC, for tail-recursive functions. In particular, you'll want to compile your numeric benchmarks with ghc -O2 -fllvm
for best results. If you don't want your programs optimized, then ghc will happily execute exactly the program you specify, which, in this case, involves lots of redundant work that would be removed at higher optimization levels.
For more information on analyzing low-level performance of Haskell programs, check RWH ch25.
For looping constructs, you often have to write in the Worker/Wrapper style to help GHC spot optimizations rather than recur at the outer function level.
Grigory Javadyan has said in a comment that the original version gets optimized out at -O3, I'd expect this version gets detected at -O2:
import System.IO
loop :: Int -> IO ()
loop n = go n
where
go n | n <= 0 = return ()
go n = go (n-1)
main = loop 1000000
精彩评论