Performance difference between multiple pointer dereferences vs references
This was an interview question asked from me a few months back:
Which of the following functions will execute faster, Foo1
or Foo2
?
void Foo(SomeObjectArray** array, unsigned int size)
{
for (int i = 0; i < size; i++)
{
if (((*array) + i) != NULL)
{
((*array) + i)->Operation1();
((*array) + i)->Operation2();
((*array) + i)->Operation3();
((*array) + i)->Operation4();
((*array) + i)->Operation5();
((*array) + i)->Operation6();
}
}
void Foo(SomeObjectArray** array, unsigned int size)
{
for (int i = 0; i < size; i++)
{
if (*((*array) + i) != NULL)
{
Object& obj = *((*array) + i);
obj.Operation1();
obj.Operation2();
obj.Operation3();
obj.Operation4();
obj.Operation5();
obj.Operation6();
}
}
}
Note that this is from memory so I don't remember the code exactly, but the general idea is the same. One function is using the pointer while the other uses a reference (it may have a pointer to an array like in the above code, but I don't remember exactly). I said I am not sure, and will have to profile the code to find out, but if I had to guess Foo2 'may' be faster
. They were not impressed...
This nagged me a couple of times here and there when I ran a开发者_开发知识库cross code similar to this (or wrote it) and would wonder what I should do in such a case.
I know that...
- This is a micro optimization
- The compiler will most likely optimize it
EDIT: I have changed the code slightly so that now it is checking for a NULL pointer.
I think this is a pretty interesting question, and I saw a lot of speculation about what a compiler might do, but I wanted to take a closer look and find out for sure. So I took e.James' program and ran it through GCC to get the assembly. I should say that I don't really know assembly, so somebody correct me if I'm wrong, but I think we can reasonably infer what's going on. :)
Compiling with -O0
(no optimization)
For Foo1
, we see that the array offset is calculated before each function call:
movl 8(%ebp), %eax
movl (%eax), %edx
movl -4(%ebp), %eax
leal (%edx,%eax), %eax
movl %eax, (%esp)
call __ZN10SomeObject10Operation1Ev
That's for all six method calls, just using different method names. Foo2
has a little bit of setup code to get the reference
movl 8(%ebp), %eax
movl (%eax), %edx
movl -4(%ebp), %eax
leal (%edx,%eax), %eax
movl %eax, -8(%ebp)
And then six of these, which looks like just a stack pointer push and a function call:
movl -8(%ebp), %eax
movl %eax, (%esp)
call __ZN10SomeObject10Operation1Ev
Pretty much what we would expect with no optimization. The output was
Foo1: 18472
Foo2: 17684
Compiling with -O1
(minimal optimization)
Foo1
is a little more efficient, but still adding up the array offset each time:
movl %esi, %eax
addl (%ebx), %eax
movl %eax, (%esp)
call __ZN10SomeObject10Operation1Ev
Foo2
looks saves the value of ebx
(addl (%edi), %ebx
), and then makes these calls:
movl %ebx, (%esp)
call __ZN10SomeObject10Operation1Ev
The timings here were
Foo1: 4979
Foo2: 4977
Compiling with -O2
(moderate optimization)
When compiling with -O2
, GCC just got rid of the whole thing and each call to Foo1
or Foo2
just resulted in adding 594 to dummy
(99 increments * 6 calls = 594 increments):
imull $594, %eax, %eax
addl %eax, _dummy
There were no calls the the object's methods, although those methods remained in the code. As we might expect, the times here were
Foo1: 1
Foo2: 0
I think this tells us that Foo2
is a little faster without optimizations, but really it's a moot point because once it starts optimizing, the compiler is just moving around a couple of longs between the stack and the registers.
Strictly speaking and with no optimizations at all I'd say Foo2 is faster cause Foo1 has to calculate the indirection ptr each time, but thats not gonna happen anywhere.
I'd say the compiler will optimize it and leave it the same.
Looks like there is plenty of room for the compiler, i
and array
don't change for the whole block on each iteration, so it could optimize it putting the pointer into a register, exactly the same it would do for the reference.
Likely the compiler will optimize them to be the same, given the common sub-expressions on each line. No guarantees though.
There aren't any rational conclusions you can reach by arm-chair reasoning like this, with today's compilers and processors. The only way to know is to try it and time it. If someone didn't make that clear in their interview answer, it would be an automatic fail from me.
Neither, any reasonable compiler will happily make them equivalent. You're not talking about NRVO in the depths of template metaprogramming here, it's a basic and simple optimization with Common Subexpression Elimination, which is extremely common and relatively basic to do, and the posted code has trivial complexity, making it overwhelmingly probable that the compiler will do such an optimization.
Just in case anyone doubts that the compiler will optimize to the same result, here is a quick & dirty test program:
#include <iostream>
#include <time.h>
using namespace std;
size_t dummy;
class SomeObject
{
public:
void Operation1();
void Operation2();
void Operation3();
void Operation4();
void Operation5();
void Operation6();
};
void SomeObject::Operation1() { for (int i = 1; i < 100; i++) { dummy++; } }
void SomeObject::Operation2() { for (int i = 1; i < 100; i++) { dummy++; } }
void SomeObject::Operation3() { for (int i = 1; i < 100; i++) { dummy++; } }
void SomeObject::Operation4() { for (int i = 1; i < 100; i++) { dummy++; } }
void SomeObject::Operation5() { for (int i = 1; i < 100; i++) { dummy++; } }
void SomeObject::Operation6() { for (int i = 1; i < 100; i++) { dummy++; } }
void Foo1(SomeObject** array, unsigned int size)
{
for (int i = 0; i < size; i++)
{
((*array) + i)->Operation1();
((*array) + i)->Operation2();
((*array) + i)->Operation3();
((*array) + i)->Operation4();
((*array) + i)->Operation5();
((*array) + i)->Operation6();
}
}
void Foo2(SomeObject** array, unsigned int size)
{
for (int i = 0; i < size; i++)
{
SomeObject& obj = *((*array) + i);
obj.Operation1();
obj.Operation2();
obj.Operation3();
obj.Operation4();
obj.Operation5();
obj.Operation6();
}
}
int main(int argc, char * argv[])
{
clock_t timer;
SomeObject * array[100];
for (int i = 0; i < 100; i++)
{
array[i] = new SomeObject();
}
timer = clock();
for (int i = 0; i < 100000; i++) { Foo1(array, 100); }
cout << "Foo1: " << clock() - timer << endl;
timer = clock();
for (int i = 0; i < 100000; i++) { Foo2(array, 100); }
cout << "Foo2: " << clock() - timer << endl;
for (int i = 0; i < 100; i++)
{
delete array[i];
}
return 0;
}
The results are always within a few milliseconds of each other:
Foo1: 15437
Foo2: 15484
Imho, the question which version is faster is irrelevant. Calling 6 different methods one after the other on an object is an OO design smell. The object should probably offer one method that does all that.
Foo2 has that extra object creation, but other then that, the compile should make them about the same
I like this question, and can't help myself but to answer, though I know I may be wrong completely wrong. Nevertheless, I guess Foo1 will be faster.
My stupid Reason? Well, I see that Foo2 creates an object reference, gets the address of "array" and then calls to its methods are made.
But in Foo1, its using the address directly, dereference it, going to the object memory and just calling the function directly. No unnecessary object reference created in Foo1 like Foo2. And we do not know what the depth of the array is in terms of inheritance, and how many base class constructors will be called to even get the object reference, which is additional time taken. So I guess Foo1 is marginally faster. Plz correct me, because I am confident I am wrong. Cheers!
精彩评论