Performance of std::pow - cache misses?
I've been trying to optimize a numeric program of mine, and have run into something of a mystery. I'm looping over code that performs thousands of floating point operations of which 1 call to pow
- nevertheless, that call takes 5% of the time... That's not necessarily a critical issue, but it is odd, so I'd like to understand what's happening.
When I profiled for cache misses, VS.NET 2010RC's profiler reports that virtually all cache misses are occurring in std::pow
... so... what's up with that? Is there a faster alternative? I tried powf
, but that's only slightly faster; it's still responsible for an abnormal number of cache misses.
Why would a basic function like pow cause cache-misses?
Edit: this is not managed code. /Oi
intrinsics are enabled, but the compiler may at its option ignore that. Replacing pow(x,y)
by exp(y*log(x))
has similar performance - just now all the cache misses are in the log functi开发者_如何学运维on.
Yea.. it's slow. As to why in detail someone else who feels more confident can try to explain.
Want to speed it up ? here : http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/
Can you give more information on the 'x' as well as the environment where pow is evaluated?
What you are seeing might be the hardware prefetchers at work. Depending on the profiler the allocation of the 'cost' of the different assembly instructions might be incorrect, it should be even more frequent on long latency instructions like the ones needed to evaluate pow.
Added to that, I would use a real profiler like VTune/PTU than the one available in any Visual Studio version.
If you replace std::pow(var)
with another function, like std::max(var, var)
, does it still take up 5%? Do you still get all the cache misses?
I'm guessing no on time and yes on cache misses. Calculating powers is slower than many other operations (which are you using?). Calling out to code that's not in the cache will cause a cache miss no matter which function it is.
If your code involves some heavy number-crunching, I wouldn't be too surprised that std::pow
is consuming 5% of the running time. Many numeric operations are very fast, so a slightly slower operation like std::pow
will appear to take more time relative to the other already-fast operations. (That would also account for why you didn't see much improvement switching to std::powf
.)
The cache misses are somewhat more puzzling, and it's hard to offer an explanation without more data. One possibility is that if your other code is so memory-intense that it gobbles up all the allocated cache, then it wouldn't be completely surprising that std::pow
is taking all the punches on the cache misses.
精彩评论