inlined functions still show up in the .prof file
I'm trying to figure out how to optimize开发者_StackOverflow社区 some code. Here it is:
{-# OPTIONS_GHC -funbox-strict-fields #-}
data Vec3 a = Vec3 !a !a !a
vx :: Vec3 a -> a
vx (Vec3 x _ _) = x
{-# SPECIALIZE INLINE vx :: Vec3 Double -> Double #-}
vy :: Vec3 a -> a
vy (Vec3 _ y _) = y
{-# SPECIALIZE INLINE vy :: Vec3 Double -> Double #-}
vz :: Vec3 a -> a
vz (Vec3 _ _ z) = z
{-# SPECIALIZE INLINE vz :: Vec3 Double -> Double #-}
dot :: (Num a) => Vec3 a -> Vec3 a -> a
dot u v = (vx u * vx v) + (vy u * vy v) + (vz u * vz v)
{-# SPECIALIZE INLINE dot :: Vec3 Double -> Vec3 Double -> Double #-}
type Vec3D = Vec3 Double
-- just make a bunch of vecs to measure performance
n = 1000000 :: Double
v1s = [Vec3 x y z | (x, y, z) <- zip3 [1 .. n] [2 .. n + 1] [3 .. n + 2]]
:: [Vec3D]
v2s = [Vec3 x y z | (x, y, z) <- zip3 [3 .. n + 2] [2 .. n + 1] [1 .. n]]
:: [Vec3D]
dots = zipWith dot v1s v2s :: [Double]
theMax = maximum dots :: Double
main :: IO ()
main = putStrLn $ "theMax: " ++ show theMax
When I compile with ghc 6.12.1 (ubuntu linux on an i486 machine)
ghc --make -O2 Vec.hs -prof -auto-all -fforce-recomp
and run
Vec +RTS -p
Looking at the Vec.prof file,
COST CENTRE MODULE %time %alloc
v2s Main 30.9 36.5
v1s Main 27.9 31.3
dots Main 27.2 27.0
CAF GHC.Float 4.4 5.2
vy Main 3.7 0.0
vx Main 2.9 0.0
theMax Main 2.2 0.0
I see that the function vx and vy take a significant portion of the time.
Why is that? I thought that the SPECIALIZE INLINE pragma would make those functions go away.
When using a non-polymorphic
data Vec3D = Vec3D {vx, vy, vz :: !Double} deriving Show
the functions vx, vy, vz do not show as a cost center.
I suspect this is a side-effect of using -auto-all
, which inhibits many optimizations GHC would normally perform, including inlining. I suspect the difference in your non-polymorphic version is actually due to vx
, vy
, and vz
being defined via record syntax rather than because of polymorphism (but I could be wrong about this).
Instead of using -auto-all, try either adding an export list to the module and compiling with "-auto", or manually setting cost centers via SCC pragmas. I usually use SCC pragmas anyway because I often want to set them on let-bound functions, which -auto-all won't do.
I could not figure out how to make comments to the replies, so I'm making comments in this answer.
First, thanks for your answers.
FUZxxl: I tried -ddump-core, and got an error message that -ddump-core was an unrecognized flag. Perhaps you meant -ddump-simpl, which the book Real World Haskell recommended using, but I'm afraid I don't know how to read the output. I looked in the output file for "vx", etc, but never saw them. I guess I should learn how to read core. Are there any good guides for that?
John: According to GHC's flag reference documentation, if I'm reading it correctly, both -auto and -auto-all, are supposed add _scc_s to functions not marked INLINE. To see if -auto would work for me, I created another test case in which the Vec3 code was in a separate file/module, with Vec3(Vec3), vx, vy, vz, and dot exported. I imported this module into a Main.hs file. Compiling these with -auto, I still saw vx, vy, vz in the .prof file.
Re: your comment that the difference could be due to record syntax instead of polymorphism, I believe that the difference is more likely due to polymorphism, because when I defined
data Vec3 a = Vec3 {vx, vy, vz :: !a}
vx, vy and vz still showed up in the .prof file.
Tad
精彩评论