A virtual function returning a small structure - return value vs output parameter?

2023-03-09 01:56 问答作者：

I have a virtual function in a hotspo开发者_如何学Ct code that needs to return a structure as a result. I have these two options:

virtual Vec4 generateVec() const = 0; // return value

virtual void generateVec(Vec4& output) const = 0; // output parameter

My question is, is there generally any difference in the performance of these functions? I'd assume the second one is faster, because it does not involve copying data on the stack. However, the first one is often much more convenient to use. If the first one is still slightly slower, would this be measurable at all? Am I too obsessed :)

Let me stress that that this function will be called millions of times every second, but also that the size of the structure Vec4 is small - 16 bytes.

As has been said, try them out - but you will quite possibly find that Vec4 generateVec() is actually faster. Return value optimization will elide the copy operation, whereas void generateVec(Vec4& output) may cause an unnecessary initialisation of the output parameter.

Is there any way you can avoid making the function virtual? If you're calling it millions of times a sec that extra level of indirection is worth looking at.

Code called millions of times per second implies you really do need to optimize for speed.

Depending on how complex the body of the derived generateVec's is, the difference between the two may be unnoticeable or could be massive.

Best bet is to try them both and profile to see if you need to worry about optimizing this particular aspect of the code.

Feeling a bit bored, so I came up with this:

#include <iostream>
#include <ctime>
#include <cstdlib>
using namespace std;

struct A {
    int n[4];
    A() {
        n[0] = n[1] = n[2] = n[3] = rand();
    }
};

A f1() {
    return A();
}

A f2( A & a ) {
    a = A();
}

const unsigned long BIG = 100000000;

int main() {
    unsigned int sum =  0;
    A a;
    clock_t t = clock();
    for ( unsigned int i = 0; i < BIG; i++ ) {
        a = f1();
        sum += a.n[0];
    }
    cout << clock() - t << endl;
    t = clock();
    for ( unsigned int i = 0; i < BIG; i++ ) {
        f2( a );
        sum += a.n[0];
    }
    cout << clock() - t << endl;
    return sum & 1;
}

Results with -O2 optimisation are that there is no significant difference.

There are chances that the first solution is faster.

A very nice article :

http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/

Just out of curiosity, I wrote 2 similar functions (uses 8-byte data types) to check their assembly code.

long long int ret_val()
{
    long long int tmp(1);
    return tmp;
}

// ret_val() assembly
.globl _Z7ret_valv
        .type   _Z7ret_valv, @function
_Z7ret_valv:
.LFB0:
        .cfi_startproc
        .cfi_personality 0x0,__gxx_personality_v0
        pushl   %ebp
        .cfi_def_cfa_offset 8
        movl    %esp, %ebp
        .cfi_offset 5, -8
        .cfi_def_cfa_register 5
        subl    $16, %esp
        movl    $1, -8(%ebp)
        movl    $0, -4(%ebp)
        movl    -8(%ebp), %eax
        movl    -4(%ebp), %edx
        leave
        ret
        .cfi_endproc

Surprisingly, the pass-by-value method below required a few more instructions:

void output_val(long long int& value)
{
    long long int tmp(2);
    value = tmp;
}

// output_val() assembly
.globl _Z10output_valRx
        .type   _Z10output_valRx, @function
_Z10output_valRx:
.LFB1:
        .cfi_startproc
        .cfi_personality 0x0,__gxx_personality_v0
        pushl   %ebp
        .cfi_def_cfa_offset 8
        movl    %esp, %ebp
        .cfi_offset 5, -8
        .cfi_def_cfa_register 5
        subl    $16, %esp
        movl    $2, -8(%ebp)
        movl    $0, -4(%ebp)
        movl    8(%ebp), %ecx
        movl    -8(%ebp), %eax
        movl    -4(%ebp), %edx
        movl    %eax, (%ecx)
        movl    %edx, 4(%ecx)
        leave
        ret
        .cfi_endproc

These functions were called in a test code as:

 long long val = ret_val();

 long long val2;
 output_val(val2);

Compiled by gcc.

A virtual function returning a small structure - return value vs output parameter?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

Best solution for private video database [closed]

imessage会显示已读吗？