Code runs ten times slower when all I did was move the loop code into a function
I was running the code below, which essentially does very little. It just adds 2 and four 100 million times and outputs the runtime.
#include "time.h"
#include <iostream>
using namespace std;
void add (){
int tot = 2+4;
}
void main(){
int t = clock();
int count = 0;
while(count<100000000){
int tot = 2+4;
count++;
}
cout <<endl<< "runtime = " << fixed <<(double) (clock() - t) / CLOCKS_PER_SEC <<"s" << endl;
}
But I was interested to see the time difference when doing the exact same thing but calling a function. So I replaced the line "int tot = 2+4" with "add()".
I w开发者_如何学Goas expecting the second runtime to be slightly longer but was a lot longer. First implementation = .3s
and second implementation = 3s
.
I understand calling the function requires using the stack to store the return address and store local data. But it must be doing a lot more then this?
Would be great if someone could explain to me what exactly causes the big difference in runtime or maybe I am doing something silly.
As mentioned already by Seth, inline functions would probably get optimized.
In the 1st case (with basic optimizations on) most likely instead of constantly adding those 2 numbers it will resolve 2 + 4 to 6 and simply just do a simple
mov eax, 6 ;eax is just an example here or
mov tot_addr, 6 ; mem->mem mov
In the 2nd case since it is a function call the system has to
push 4 ;assuming 4 byte ints
push 4 ;wrote 2 to be clear that you have to push both variables
call add
or something along these lines. As the call stack needs to be created for that function (ommitted return value push and such for simplicity). Once this function returns the stack pointer needs to be moved back previous to that first push and then the RET will set the instruction pointer back. As you can see this is more expensive than doing a simple
mov eax, 4
add eax, 2
which is likely to be the case if you were doing just a simple (non optimized add)
EDIT: Here's some more information about in-lining the function. When you inline the function it simply takes whatever functionality the function itself would do and places the instructions directly where it is reference instead of doing a CALL instruction and setting up a function call directly. For instance instead of
mov eax, 4
mov ecx, 2
push 4 ; size for return value
push eax
push ecx
call add
You will end up with
mov eax, 4
mov exc, 2
add eax, ecx
In code terms:
int a = 4;
int b = 2;
int res = add(a, b);
would become
int a = 4;
int b = 2;
int res = a + b;
assuming you inline the add function.
Relative to doing other things, function calls are very expensive (more expensive than you think), for just the reason you mention. Try declaring your function inline
and/or enabling optimisations to get back the performance.
A function call can indeed cause a lot more work for the CPU than simply pushing a return address and parameters.
On some CPUs, you'll get a pipeline stall. Maybe even an i-cache miss while reading the next instruction. These could easily cause even more than the 10x slowdown you've noted.
That's why compilers go to great lengths to optimize away function calls.
精彩评论