Performance difference between multiple pointer dereferences vs references

2023-03-01 14:39 问答作者：

This was an interview question asked from me a few months back:

Which of the following functions will execute faster, Foo1 or Foo2?

void Foo(SomeObjectArray** array, unsigned int size)
{
    for (int i = 0; i < size; i++)
    {
        if (((*array) + i) != NULL)
        {
            ((*array) + i)->Operation1();
            ((*array) + i)->Operation2();
            ((*array) + i)->Operation3();
            ((*array) + i)->Operation4();
            ((*array) + i)->Operation5();
            ((*array) + i)->Operation6();
    }
}

void Foo(SomeObjectArray** array, unsigned int size)
{
    for (int i = 0; i < size; i++)
    {
        if (*((*array) + i) != NULL)
        {
            Object& obj = *((*array) + i);
            obj.Operation1();
            obj.Operation2();
            obj.Operation3();
            obj.Operation4();
            obj.Operation5();
            obj.Operation6();
        }
    }
}

Note that this is from memory so I don't remember the code exactly, but the general idea is the same. One function is using the pointer while the other uses a reference (it may have a pointer to an array like in the above code, but I don't remember exactly). I said I am not sure, and will have to profile the code to find out, but if I had to guess Foo2 'may' be faster. They were not impressed...

This nagged me a couple of times here and there when I ran a开发者_开发知识库cross code similar to this (or wrote it) and would wonder what I should do in such a case.

I know that...

This is a micro optimization
The compiler will most likely optimize it

EDIT: I have changed the code slightly so that now it is checking for a NULL pointer.

I think this is a pretty interesting question, and I saw a lot of speculation about what a compiler might do, but I wanted to take a closer look and find out for sure. So I took e.James' program and ran it through GCC to get the assembly. I should say that I don't really know assembly, so somebody correct me if I'm wrong, but I think we can reasonably infer what's going on. :)

Compiling with -O0 (no optimization)

For Foo1, we see that the array offset is calculated before each function call:

movl    8(%ebp), %eax
movl    (%eax), %edx
movl    -4(%ebp), %eax
leal    (%edx,%eax), %eax
movl    %eax, (%esp)
call    __ZN10SomeObject10Operation1Ev

That's for all six method calls, just using different method names. Foo2 has a little bit of setup code to get the reference

movl    8(%ebp), %eax
movl    (%eax), %edx
movl    -4(%ebp), %eax
leal    (%edx,%eax), %eax
movl    %eax, -8(%ebp)

And then six of these, which looks like just a stack pointer push and a function call:

movl    -8(%ebp), %eax
movl    %eax, (%esp)
call    __ZN10SomeObject10Operation1Ev

Pretty much what we would expect with no optimization. The output was

Foo1: 18472
Foo2: 17684

Compiling with -O1 (minimal optimization)

Foo1 is a little more efficient, but still adding up the array offset each time:

movl    %esi, %eax
addl    (%ebx), %eax
movl    %eax, (%esp)
call    __ZN10SomeObject10Operation1Ev

Foo2 looks saves the value of ebx (addl (%edi), %ebx), and then makes these calls:

movl    %ebx, (%esp)
call    __ZN10SomeObject10Operation1Ev

The timings here were

Foo1: 4979
Foo2: 4977

Compiling with -O2 (moderate optimization)

When compiling with -O2, GCC just got rid of the whole thing and each call to Foo1 or Foo2 just resulted in adding 594 to dummy (99 increments * 6 calls = 594 increments):

imull   $594, %eax, %eax
addl    %eax, _dummy

There were no calls the the object's methods, although those methods remained in the code. As we might expect, the times here were

Foo1: 1
Foo2: 0

I think this tells us that Foo2 is a little faster without optimizations, but really it's a moot point because once it starts optimizing, the compiler is just moving around a couple of longs between the stack and the registers.

Strictly speaking and with no optimizations at all I'd say Foo2 is faster cause Foo1 has to calculate the indirection ptr each time, but thats not gonna happen anywhere.

I'd say the compiler will optimize it and leave it the same.
Looks like there is plenty of room for the compiler, i and array don't change for the whole block on each iteration, so it could optimize it putting the pointer into a register, exactly the same it would do for the reference.

Likely the compiler will optimize them to be the same, given the common sub-expressions on each line. No guarantees though.

There aren't any rational conclusions you can reach by arm-chair reasoning like this, with today's compilers and processors. The only way to know is to try it and time it. If someone didn't make that clear in their interview answer, it would be an automatic fail from me.

Neither, any reasonable compiler will happily make them equivalent. You're not talking about NRVO in the depths of template metaprogramming here, it's a basic and simple optimization with Common Subexpression Elimination, which is extremely common and relatively basic to do, and the posted code has trivial complexity, making it overwhelmingly probable that the compiler will do such an optimization.

Just in case anyone doubts that the compiler will optimize to the same result, here is a quick & dirty test program:

#include <iostream>
#include <time.h>

using namespace std;
size_t dummy;

class SomeObject
{
    public:
        void Operation1();
        void Operation2();
        void Operation3();
        void Operation4();
        void Operation5();
        void Operation6();
};

void SomeObject::Operation1() { for (int i = 1; i < 100; i++) { dummy++; } }
void SomeObject::Operation2() { for (int i = 1; i < 100; i++) { dummy++; } }
void SomeObject::Operation3() { for (int i = 1; i < 100; i++) { dummy++; } }
void SomeObject::Operation4() { for (int i = 1; i < 100; i++) { dummy++; } }
void SomeObject::Operation5() { for (int i = 1; i < 100; i++) { dummy++; } }
void SomeObject::Operation6() { for (int i = 1; i < 100; i++) { dummy++; } }

void Foo1(SomeObject** array, unsigned int size)
{
    for (int i = 0; i < size; i++)
    {
        ((*array) + i)->Operation1();
        ((*array) + i)->Operation2();
        ((*array) + i)->Operation3();
        ((*array) + i)->Operation4();
        ((*array) + i)->Operation5();
        ((*array) + i)->Operation6();
    }
}

void Foo2(SomeObject** array, unsigned int size)
{
    for (int i = 0; i < size; i++)
    {
        SomeObject& obj = *((*array) + i);
        obj.Operation1();
        obj.Operation2();
        obj.Operation3();
        obj.Operation4();
        obj.Operation5();
        obj.Operation6();
    }
}


int main(int argc, char * argv[])
{
    clock_t timer;

    SomeObject * array[100];
    for (int i = 0; i < 100; i++)
    {
        array[i] = new SomeObject();
    }

    timer = clock();
    for (int i = 0; i < 100000; i++) { Foo1(array, 100); }
    cout << "Foo1: " << clock() - timer << endl;

    timer = clock();
    for (int i = 0; i < 100000; i++) { Foo2(array, 100); }
    cout << "Foo2: " << clock() - timer << endl;

    for (int i = 0; i < 100; i++)
    {
        delete array[i];
    }

    return 0;
}

The results are always within a few milliseconds of each other:

Foo1: 15437
Foo2: 15484

Imho, the question which version is faster is irrelevant. Calling 6 different methods one after the other on an object is an OO design smell. The object should probably offer one method that does all that.

Foo2 has that extra object creation, but other then that, the compile should make them about the same

I like this question, and can't help myself but to answer, though I know I may be wrong completely wrong. Nevertheless, I guess Foo1 will be faster.

My stupid Reason? Well, I see that Foo2 creates an object reference, gets the address of "array" and then calls to its methods are made.

But in Foo1, its using the address directly, dereference it, going to the object memory and just calling the function directly. No unnecessary object reference created in Foo1 like Foo2. And we do not know what the depth of the array is in terms of inheritance, and how many base class constructors will be called to even get the object reference, which is additional time taken. So I guess Foo1 is marginally faster. Plz correct me, because I am confident I am wrong. Cheers!

Performance difference between multiple pointer dereferences vs references

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？