Loop counter & pointers
I'm writing a small specialized C99 lib开发者_运维技巧rary for graphs and I often get loops of the form:
for(int i = 0; i < graph->nvertices; ++i) {
// ...
}
I'm wondering if it's a good practice, especially in the case of tigh loop. At first I thought the compiler would be smart enough to look at 'graph->nvertices' only once instead of looking at it at every iteration, but it seems impossible as graph->nvertices could change inside the loop. Is is smarter and faster to write:
const int N = graph->nvertices;
for(int i = 0; i < N; ++i) {
// ...
}
It seems faster as it doesn't need look at the pointer more than once, but it does require the creation of a new variable.
Note: I guess this is the kind of situation where it's nice to be able to read a little assembly code to see what the compiler is actually doing, if someone has a nice reference I'm open to suggestions.
Try using higher optimization settings, some compilers should be able to optimize that away for you. You can also iterate backwards, only initializing the counter with the expression:
for (int i = graph->nvertices; i >= 0; --i)
..
However, you're going to wreak havoc on cache performance. I think the method you suggested is most straightforward, which lends itself to be clear to the compiler and to the next person reading your code.
I tend to do these optimizations myself. Sometimes the compiler can infer that nvertices
doesn't change along the whole loop, but what if you have call to other functions that may change the value? The compiler cannot infer it and may not optimize the code.
Also, the best way always is profiling your code to see the comparison between the two approaches.
i use this usually.
const int N = graph->nvertices;
int i = 0;
for(; i < N; ++i) {
// ...
}
The answer may depend on your compiler, but most compilers will produce an assembly listing for you that you can study. Just make sure you list the release build.
However, if I was really concerned about every bit of performance, I might create the separate count variable as you suggest.
In the end, I doubt it would make a noticeable amount of difference.
You asked for a way to look at the assembler code. For this you can use the objdump
program like this:
objdump -d executable
To filter out the main-function, use:
objdump -d executable | sed -n '/<main>/,/^$/p'
This seems a fairly easy thing to test, simply by looking at the output of the compiler. Whole program optimization usually catches these low level optimizations.
On the other hand, I usually won't make these types of optimizations even if there is a slight increase in speed, since I find the first form easier to read and maintain.
I'd go for localizing the scope of the variable. The optimizer will then figure out all by itself:
for(size_t i = 0, n = graph->nvertices; i < n; ++i) {
// ...
}
精彩评论