C structure pointer dereferencing speed
I have a question regarding the speed of pointer dereferencing. I have a structure like so:
typedef struct _TD_RECT TD_RECT;
struct _TD_RECT {
double left;
do开发者_StackOverflow社区uble top;
double right;
double bottom;
};
My question is, which of these would be faster and why?
CASE 1:
TD_RECT *pRect;
...
for(i = 0; i < m; i++)
{
if(p[i].x < pRect->left) ...
if(p[i].x > pRect->right) ...
if(p[i].y < pRect->top) ...
if(p[i].y > pRect->bottom) ...
}
CASE 2:
TD_RECT *pRect;
double left = pRect->left;
double top = pRect->top;
double right = pRect->right;
double bottom = pRect->bottom;
...
for(i = 0; i < m; i++)
{
if(p[i].x < left) ...
if(p[i].x > right) ...
if(p[i].y < top) ...
if(p[i].y > bottom) ...
}
So in case 1, the loop is directly dereferencing the pRect pointer to obtain the comparison values. In case 2, new values were made on the function's local space (on the stack) and the values were copied from the pRect to the local variables. Through a loop there will be many comparisons.
In my mind, they would be equally slow, because the local variable is also a memory reference on the stack, but I'm not sure...
Also, would it be better to keep referencing p[] by index, or increment p by one element and dereference it directly without an index.
Any ideas? Thanks :)
You'll probably find it won't make a difference with modern compilers. Most of them would probably perform common subexpresion elimination of the expressions that don't change within the loop. It's not wise to assume that there's a simple one-to-one mapping between your C statements and assembly code. I've seen gcc pump out code that would put my assembler skills to shame.
But this is neither a C nor C++ question since the ISO standard doesn't mandate how it's done. The best way to check for sure is to generate the assembler code with something like gcc -S
and examine the two cases in detail.
You'll also get more return on your investment if you steer away from this sort of micro-optimisation and concentrate more on the macro level, such as algorithm selection and such.
And, as with all optimisation questions, measure, don't guess! There are too many variables which can affect it, so you should be benchmarking different approaches in the target environment, and with realistic data.
It is not likely to be a hugely performance critical difference. You could profile doing each option multiple times and see. Ensure you have your compiler optimisations set in the test.
With regards to storing the doubles, you might get some performance hit by using const. How big is your array?
With regards to using pointer arithmetic, this can be faster, yes.
You can instantly optimise if you know left < right in your rect (surely it must be). If x < left it can't also be > right so you can put in an "else".
Your big optimisation, if there is one, would come from not having to loop through all the items in your array and not have to perform 4 checks on all of them.
For example, if you indexed or sorted your array on x and y, you would be able, using binary search, to find all values that have x < left and loop through just those.
I think the second case is likely to be faster because you are not dereferencing the pointer to pRect on every loop iteration.
Practically, a compiler doing optimisation may notice this and there might be no difference in the code that is generated, but the possibility of pRect being an alias of an item in p[] could prevent this.
An optimizing compiler will see that the structure accesses are loop invariant and so do a Loop-invariant code motion, making your two cases look the same.
I will be surprised if even a totally non-optimized compile (- O0) will produce differentcode for the two cases presented. In order to perform any operation on a modern processor, the data need to loaded into registers. So even when you declare automatic variables, these variables will not exist in main memory but rather in one of the processors floating point registers. This will be true even when you do not declare the variables yourself and therefore I expect no difference in generated machine code even for when you declare the temporary variables in your C++ code.
But as others have said, compile the code into assembler and see for yourself.
精彩评论