Writing code to help the compiler to do optimizations
Does anyone know if there is a list of what a compiler do to op开发者_StackOverflowtimize a source code? I prefer GCC as example.
I want to know what a programmer should do with the code to get good optimization and help the compiler to optimize it. Some optimizations by programmer may avoid the compiler to do better optimizations.
Examples:
replace
for (int i = 0; i < n - 1; i++ )
by
int n2 = n - 1;
for (int i = 0; i < n2; i++ )
for (int i = 0; i < n/2; i++ )
by
int n2 = n/2;
for (int i = 0; i < n2; i++ )
for (int i = 0; i < obj.calc_value(); i++ ) //calc_value() will return the same result with obj remaining unchanged.
by
int y = obj.calc_value()
for (int i = 0; i < y; i++ )
It is important to keep the code simple to read and understand.
Thanks
Edit:
Other examples:
- Inline functions
- Remove recursion
Seriously, just leave that up to the compiler. I've seen the code that gcc outputs at its "insane" -O3
level and it's proof positive that the people who wrote those optimisation engines are either aliens or from a substantially distant future time.
I've yet to see a situation where register
or inline
made an appreciable difference in performance of my code. That doesn't mean it won't, just that the compiler writers know far more tricks than us mere mortals when it comes to extracting the last ounce of performance from the processor.
As far as optimisation goes, it should only be done where there is a real problem. That means profiling code and discovering bottlenecks but, more importantly, not optimising an operation that is not deemed slow in context. There zero difference to a user whether a one-shot operation takes a tenth of a second or a hundredth.
And sometimes, optimisation for readability is the best one you can do :-)
As an aside, this is just one of the nifty tricks gcc does for you. Consider the following code which is supposed to calculate the factorial and return it:
static int fact (unsigned int n) {
if (n == 0) return 1;
return n * fact (n-1);
}
int main (void) {
return fact (6);
}
This compiles to (at -O3
):
main: pushl %ebp ; stack frame setup.
movl $720, %eax ; just load 720 (6!) into eax.
movl %esp, %ebp ; stack frame
popl %ebp ; tear-down.
ret ; and return.
That's right, gcc just works it all out at compile-time and turns the whole thing into the equivalent of:
int main (void) { return 720; }
Contrast this with the -O0
(naive) version:
main: pushl %ebp ; stack
movl %esp, %ebp ; frame
andl $-16, %esp ; set
subl $16, %esp ; up.
movl $6, (%esp) ; pass 6 as parameter.
call fact ; call factorial function.
leave ; stack frame tear down.
ret ; and exit.
fact: pushl %ebp ; stack
movl %esp, %ebp ; frame
subl $24, %esp ; set up.
cmpl $0, 8(%ebp) ; passed param zero?
jne .L2 ; no, keep going.
movl $1, %eax ; yes, set return to 1.
jmp .L3 ; goto return bit.
.L2: movl 8(%ebp), %eax ; get parameter.
subl $1, %eax ; decrement.
movl %eax, (%esp) ; pass that value to next level down.
call fact ; call factorial function.
imull 8(%ebp), %eax ; multiply return value by passed param.
.L3: leave ; stack frame tear down.
ret ; and exit.
All your suggested improvements are examples of Loop-invariant code motion which is an elementary optimisation that pretty much every optimising compiler does.
The range of optimisations performed by real compilers is much more advanced than these examples. The Wikipedia article linked above has some links for further reading.
As for the code you've posted, I agree with paxdiablo's answer. Compilers can optimize very well without hints in a lot of cases.
Template Metaprogramming
If you're looking to help the compiler optimize (and have a reason to - profile, profile, profile!), the most potentially useful tricks I've seen are template metaprogramming.
Boost has some direct support for template metaprogramming.
A useful example is template meta-programmed matrix math libraries that reduce the number of operations done, while leaving those operations in your source code. They also evaluate some operations completely at compile time.
Here's the first of those that shows up on google: http://arma.sourceforge.net/
Const Correctness
Another thing you should investigate, since it can easily help bug-proof your code, is const-correctness. An added benefit is that it can sometimes help with compiler optimizations.
Both const-correctness and template-metaprogramming are pretty hard to master, but very useful. Such is most of C++ :)
No matter what the "don't optimizers" here tell you it is always possible to help the (C in my case) compiler produce better code.
The compiler will be better than most programmers at tactically ordering code to make the most optimal use of available pipelines and execution units. Having said that, it is also important to point out that the compiler is not as good at the strategic programming level, i e the level above individual procedures.
Poorly-performing software is far more common than well-performing. One thing the "don't optimizers" seldom admit (and definitely not in the answers to your post) is that since it is (obviously) possible to write low-performing software it is also (obviously) possible to write well-performing. Compilers are not alchemy: feed them garbage source code and they will produce garbage machine code. Feed them competent source code and they will certainly produce competent machine code.
I'm ambivalent on the subject of optimisation. On the one hand if a performance issue has been pinpointed (most often when development is nearing completion) it is usually too late to get any significant improvement (say 100% or greater) because that requires work at the strategic level (which there is no time for). On the other, if it's done earlier the "complete performance picture" is not available so the "don't optimizers" will say it's too early to optimize because there is no valid data. Optimisation is all too often something that is done in panic to salvage some poorly-constructed software or perhaps as life-support through acceptance tests. Optimisation in a positive context (making good code execute more efficiently) is more or less unheard of (at least here?).
I enjoy the challenges of writing code which performs well. My code is readable but from first-hand experience I choose certain code constructs because I know they will perform better. Also I instrument my applications from the very beginning of development such that they measure themselves and then I can keep an eye on performance as development moves along.
精彩评论