A virtual function returning a small structure - return value vs output parameter?
I have a virtual function in a hotspo开发者_如何学Ct code that needs to return a structure as a result. I have these two options:
virtual Vec4 generateVec() const = 0; // return value
virtual void generateVec(Vec4& output) const = 0; // output parameter
My question is, is there generally any difference in the performance of these functions? I'd assume the second one is faster, because it does not involve copying data on the stack. However, the first one is often much more convenient to use. If the first one is still slightly slower, would this be measurable at all? Am I too obsessed :)
Let me stress that that this function will be called millions of times every second, but also that the size of the structure Vec4 is small - 16 bytes.
As has been said, try them out - but you will quite possibly find that Vec4 generateVec()
is actually faster. Return value optimization will elide the copy operation, whereas void generateVec(Vec4& output)
may cause an unnecessary initialisation of the output
parameter.
Is there any way you can avoid making the function virtual? If you're calling it millions of times a sec that extra level of indirection is worth looking at.
Code called millions of times per second implies you really do need to optimize for speed.
Depending on how complex the body of the derived generateVec's is, the difference between the two may be unnoticeable or could be massive.
Best bet is to try them both and profile to see if you need to worry about optimizing this particular aspect of the code.
Feeling a bit bored, so I came up with this:
#include <iostream>
#include <ctime>
#include <cstdlib>
using namespace std;
struct A {
int n[4];
A() {
n[0] = n[1] = n[2] = n[3] = rand();
}
};
A f1() {
return A();
}
A f2( A & a ) {
a = A();
}
const unsigned long BIG = 100000000;
int main() {
unsigned int sum = 0;
A a;
clock_t t = clock();
for ( unsigned int i = 0; i < BIG; i++ ) {
a = f1();
sum += a.n[0];
}
cout << clock() - t << endl;
t = clock();
for ( unsigned int i = 0; i < BIG; i++ ) {
f2( a );
sum += a.n[0];
}
cout << clock() - t << endl;
return sum & 1;
}
Results with -O2 optimisation are that there is no significant difference.
There are chances that the first solution is faster.
A very nice article :
http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/
Just out of curiosity, I wrote 2 similar functions (uses 8-byte data types) to check their assembly code.
long long int ret_val()
{
long long int tmp(1);
return tmp;
}
// ret_val() assembly
.globl _Z7ret_valv
.type _Z7ret_valv, @function
_Z7ret_valv:
.LFB0:
.cfi_startproc
.cfi_personality 0x0,__gxx_personality_v0
pushl %ebp
.cfi_def_cfa_offset 8
movl %esp, %ebp
.cfi_offset 5, -8
.cfi_def_cfa_register 5
subl $16, %esp
movl $1, -8(%ebp)
movl $0, -4(%ebp)
movl -8(%ebp), %eax
movl -4(%ebp), %edx
leave
ret
.cfi_endproc
Surprisingly, the pass-by-value method below required a few more instructions:
void output_val(long long int& value)
{
long long int tmp(2);
value = tmp;
}
// output_val() assembly
.globl _Z10output_valRx
.type _Z10output_valRx, @function
_Z10output_valRx:
.LFB1:
.cfi_startproc
.cfi_personality 0x0,__gxx_personality_v0
pushl %ebp
.cfi_def_cfa_offset 8
movl %esp, %ebp
.cfi_offset 5, -8
.cfi_def_cfa_register 5
subl $16, %esp
movl $2, -8(%ebp)
movl $0, -4(%ebp)
movl 8(%ebp), %ecx
movl -8(%ebp), %eax
movl -4(%ebp), %edx
movl %eax, (%ecx)
movl %edx, 4(%ecx)
leave
ret
.cfi_endproc
These functions were called in a test code as:
long long val = ret_val();
long long val2;
output_val(val2);
Compiled by gcc.
精彩评论