How to improve the performance of this Haskell program?
I'm working through the problems in Project Euler as a way of learning Haskell, and I find that my programs are a lot slower than a comparable C version, even when compiled. What can I do to speed up my Haskell programs?
For example, my brute-force solution to Problem 14 is:
import Data.Int
import Data.Ord
import Data.List
searchTo = 1000000
nextNumber :: Int64 -> Int64
nextNumber n
| even n = n `div` 2
| otherwise = 3 * n + 1
sequenceLength :: Int64 -> Int
sequenceLength 1 = 1
sequenceLength n = 1 + (sequenceLength next)
where next = nextNumber n
longestSeque开发者_如何学JAVAnce = maximumBy (comparing sequenceLength) [1..searchTo]
main = putStrLn $ show $ longestSequence
Which takes around 220 seconds, while an "equivalent" brute-force C version only takes 1.2 seconds.
#include <stdio.h>
int main(int argc, char **argv)
{
int longest = 0;
int terms = 0;
int i;
unsigned long j;
for (i = 1; i <= 1000000; i++)
{
j = i;
int this_terms = 1;
while (j != 1)
{
this_terms++;
if (this_terms > terms)
{
terms = this_terms;
longest = i;
}
if (j % 2 == 0)
j = j / 2;
else
j = 3 * j + 1;
}
}
printf("%d\n", longest);
return 0;
}
What am I doing wrong? Or am I naive to think that Haskell could even approach C's speed?
(I'm compiling the C version with gcc -O2, and the Haskell version with ghc --make -O).
For testing purpose I have just set searchTo = 100000
. The time taken is 7.34s. A few modification leads to some big improvement:
Use an
Integer
instead ofInt64
. This improves the time to 1.75s.Use an accumulator (you don't need sequenceLength to be lazy right?) 1.54s.
seqLen2 :: Int -> Integer -> Int seqLen2 a 1 = a seqLen2 a n = seqLen2 (a+1) (nextNumber n) sequenceLength :: Integer -> Int sequenceLength = seqLen2 1
Rewrite the
nextNumber
usingquotRem
, thus avoiding computing the division twice (once ineven
and once indiv
). 1.27s.nextNumber :: Integer -> Integer nextNumber n | r == 0 = q | otherwise = 6*q + 4 where (q,r) = quotRem n 2
Use Schwartzian transform instead of
maximumBy
. The problem ofmaximumBy . comparing
is that thesequenceLength
function is called more than once for each value. 0.32s.longestSequence = snd $ maximum [(sequenceLength a, a) | a <- [1..searchTo]]
Note:
- I check the time by compiling with
ghc -O
and run with+RTS -s
) - My machine is running on Mac OS X 10.6. The GHC version is 6.12.2. The compiled file is in i386 architecture.)
- The C problem runs at 0.078s with the corresponding parameter. It is compiled with
gcc -O3 -m32
.
Although this is already rather old, let me chime in, there's one crucial point that hasn't been addressed before.
First, the timings of the different programmes on my box. Since I'm on a 64-bit linux system, they show somewhat different characteristics: using Integer
instead of Int64
does not improve performance as it would with a 32-bit GHC, where each Int64
operation would incur the cost of a C-call while the computations with Integer
s fitting in signed 32-bit integers don't need a foreign call (since only few operations exceed that range here, Integer
is the better choice on a 32-bit GHC).
- C: 0.3 seconds
- Original Haskell: 14.24 seconds, using
Integer
instead ofInt64
: 33.96 seconds - KennyTM's improved version: 5.55 seconds, using
Int
: 1.85 seconds - Chris Kuklewicz's version: 5.73 seconds, using
Int
: 1.90 seconds - FUZxxl's version: 3.56 seconds, using
quotRem
instead ofdivMod
: 1.79 seconds
So what have we?
- Calculate the length with an accumulator so the compiler can transform it (basically) into a loop
- Don't recalculate the sequence lengths for the comparisons
- Don't use
div
resp.divMod
when it's not necessary,quot
resp.quotRem
are much faster
What is still missing?
if (j % 2 == 0)
j = j / 2;
else
j = 3 * j + 1;
Any C compiler I have used transforms the test j % 2 == 0
into a bit-masking and doesn't use a division instruction. GHC does not (yet) do that. So testing even n
or computing n `quotRem` 2
is quite an expensive operation. Replacing nextNumber
in KennyTM's Integer
version with
nextNumber :: Integer -> Integer
nextNumber n
| fromInteger n .&. 1 == (0 :: Int) = n `quot` 2
| otherwise = 3*n+1
reduces its running time to 3.25 seconds (Note: for Integer
, n `quot` 2
is faster than n `shiftR` 1
, that takes 12.69 seconds!).
Doing the same in the Int
version reduces its running time to 0.41 seconds. For Int
s, the bit-shift for division by 2 is a bit faster than the quot
operation, reducing its running time to 0.39 seconds.
Eliminating the construction of the list (that doesn't appear in the C version either),
module Main (main) where
import Data.Bits
result :: Int
result = findMax 0 0 1
findMax :: Int -> Int -> Int -> Int
findMax start len can
| can > 1000000 = start
| canlen > len = findMax can canlen (can+1)
| otherwise = findMax start len (can+1)
where
canlen = findLen 1 can
findLen :: Int -> Int -> Int
findLen l 1 = l
findLen l n
| n .&. 1 == 0 = findLen (l+1) (n `shiftR` 1)
| otherwise = findLen (l+1) (3*n+1)
main :: IO ()
main = print result
yields a further small speedup, resulting in a running time of 0.37 seconds.
So the Haskell version that's in close correspondence to the C version doesn't take that much longer, it's a factor of ~1.3.
Well, let's be fair, there's an inefficiency in the C version that's not present in the Haskell versions,
if (this_terms > terms)
{
terms = this_terms;
longest = i;
}
appearing in the inner loop. Lifting that out of the inner loop in the C version reduces its running time to 0.27 seconds, making the factor ~1.4.
The comparing may be recomputing sequenceLength
too much. This is my best version:
type I = Integer
data P = P {-# UNPACK #-} !Int {-# UNPACK #-} !I deriving (Eq,Ord,Show)
searchTo = 1000000
nextNumber :: I -> I
nextNumber n = case quotRem n 2 of
(n2,0) -> n2
_ -> 3*n+1
sequenceLength :: I -> Int
sequenceLength x = count x 1 where
count 1 acc = acc
count n acc = count (nextNumber n) (succ acc)
longestSequence = maximum . map (\i -> P (sequenceLength i) i) $ [1..searchTo]
main = putStrLn $ show $ longestSequence
The answer and timing are slower than C, but it does use arbitrary precision integers (through the Integer
type):
ghc -O2 --make euler14-fgij.hs
time ./euler14-fgij
P 525 837799
real 0m3.235s
user 0m3.184s
sys 0m0.015s
Haskell's lists are heap-based, whereas your C code is exceedingly tight and makes no heap use at all. You need to refactor to remove the dependency on lists.
Even if I'm a bit late, here is mine, I removed the dependency on lists and this solution uses no heap at all too.
{-# LANGUAGE BangPatterns #-}
-- Compiled with ghc -O2 -fvia-C -optc-O3 -Wall euler.hs
module Main (main) where
searchTo :: Int
searchTo = 1000000
nextNumber :: Int -> Int
nextNumber n = case n `divMod` 2 of
(k,0) -> k
_ -> 3*n + 1
sequenceLength :: Int -> Int
sequenceLength n = sl 1 n where
sl k 1 = k
sl k x = sl (k + 1) (nextNumber x)
longestSequence :: Int
longestSequence = testValues 1 0 0 where
testValues number !longest !longestNum
| number > searchTo = longestNum
| otherwise = testValues (number + 1) longest' longestNum' where
nlength = sequenceLength number
(longest',longestNum') = if nlength > longest
then (nlength,number)
else (longest,longestNum)
main :: IO ()
main = print longestSequence
I compiled this piece with ghc -O2 -fvia-C -optc-O3 -Wall euler.hs
and it runs in 5 secs, compared to 80 of the beginning implementation. It doesn't uses Integer
, but because I'm on a 64-bit machine, the results may be cheated.
The compiler can unbox all Int
s in this case, resulting in really fast code. It runs faster than all other solutions I've seen so far, but C is still faster.
精彩评论