Can function overhead slow down a program by a factor of 50x?
I have a code that I'm running for a project. It is O(开发者_高级运维N^2), where N is 200 for my case. There is an algorithm that turns this O(N^2) to O(N logN). This means that, with this new algorithm, it should be ~100 times faster. However, I'm only getting a factor of 2-fold increase (aka 2x faster).
I'm trying to narrow down things to see if I messed something up, or whether it's something inherent to the way I coded this program. For starters, I have a lot of function overhead within nested classes. For example, I have a lot of this (within many loops):
energy = globals->pair_style->LJ->energy();
Since I'm getting the right results when it comes to actual data, just wrong speed increase, I'm wondering if function overhead can actually cause that much speed decrease, by as much as 50-fold.
Thanks!
Firstly, your interpretation that O(N logN)
is ~100 times faster than O(N^2)
for N=200
is incorrect. The big-Oh notation deals with upper bounds and behaviour in the limit, and doesn't account for any multiplicative constants in the complexity.
Secondly, yes, on modern hardware function calls tend to be relatively expensive due to pipeline disruption. To find out how big a factor this is in your case, you'd have to come up with some microbenchmarks.
The absoloute biggest hit is cache misses. An L1 cache miss is relatively cheap but when you miss on L2 (or L3 if you have it) you may be losing hundreds or even thousands of cycles to the incoming stall.
Thing is though this may only be part of the problem. Do not optimise your code until you have profiled it. Identify the slow areas and then figure out WHY they are slow. Once you have an understanding of why its running slowly you have a good chance to optimise it.
As an aside, O notation is very handy but is not the be all and end all. I've seen O(n^2) algorithms work significantly faster than O(n log n) for small amounts fo data (and small may mean less than several thousand) due to the fact that they cache far more effectively.
The important thing about Big O notation is that it only specifies the limit of the execution time, as the data set size increases - any constants are thrown away. While O(N^2) is indeed slower than O(N log N), the actual run times might be N^2 vs. 1000N log N - that is, an O(N^2) can be faster than O(N log N) on some data sets.
Without more details, it's hard to say more - yes, function calls do indeed have a fair amount of overhead, and that might be why you're not seeing a bigger increase in performance - or it might just be the case that your O(N log N) doesn't perform quite as well on a data set of your size.
I've worked on image processing algorithms, and calling a function per pixel (ie: for 640x480 would be 307200) can significanly reduce performance. Try declaring your function inline, or making the function a macro. This can quickly show you if it is because of function calls. Try looking at some profiling tools. VS 2010 comes with some nice tools, or else there is also Intel VTune, glowcode. They can help show where you are spending time.
IMHO I don't think that 1600 function calls should reduce performance much at all (200 log 200)
I suggest profiling it using
The big FAQ topic on profiling is here: How can I profile C++ code running in Linux?
gprof
(requires compiletime instrumentation)valgrind --tool=callgrind
andkcachegrind
; excellent tool with excellent visualization - screenshots here:
精彩评论