What kinds of optimization LLVM does and what kinds of optimizations its frontends have to implement themselves?
Notice: I noticed this question is a lot related to this one, so if you're somebody interested in my question, you should definitely read that other one and its answers too.
I can think of some optimizations an OOP language frontend could do, such as creating temporary variables to hold values from const method calls called in sequence, without intermediary non-const calls to the object in question, to cut off function calls, but I can't think of many more. I'd like to ask people to create a longer list of examples.
I ask this because I want to create a small language as a pet 开发者_如何转开发project and I'm not sure how to study this subject very well. Maybe this is a case for the community wiki? A comprehensive list of optimizations the LLVM backend does and that frontends should do themselves, what do you think?
Oh, and I know different frontends can have widely different needs, but my focus is on procedural/OOP languages.
This probably varies a lot by language... clang (C/C++) is able to get away with doing very little in terms of optimizations in the frontend. The only optimization I can think of that is done for performance of the generated code is that clang does some devirtualization of C++ methods in the frontend. clang does some other optimizations like constant folding and dead code elimination, but that's primarily done to speed up compile-time, not for the performance of the generated code.
EDIT: Actually, thinking about it a bit more, I just remembered one more important optimization clang does for C++: clang knows a few tricks to elide copy constructors in C++ (google for NRVO).
In some cases, a language-specific IR optimization pass can be useful. There is a a SimplifyLibCalls pass which knows how to optimize calls into the C standard library. For the new Objective-C ARC language feature, clang puts some ARC-specific passes into the pipeline; those optimize out calls to various Objective-C runtime functions.
In general, implementing optimizations in the frontend is only generally helpful when code has properties which cannot be encoded into the IR (e.g. C++ objects have a constant vtable pointer). And in practice, you most likely want to implement dumb code generation first, and see whether there are important cases which are not optimized. The optimizers can do some surprisingly complex transformations.
See also http://llvm.org/docs/tutorial/LangImpl7.html ; using alloca appropriately is one thing which helps the optimizers substantially, although it isn't really an optimization itself.
There are many, many optimizations that need only as much information as is kept in SSA form, which is used by LLVM. SSA gives a lot of possibility to analyse in terms of control-flow, data-flow.
On the other hand, LLVM language is RISC, so many high level information is lost.
So answer is : front-end is capable of doing optimisations that requires information that is lost after translating into SSA. Examples that come to my mind:
- preferred branching optimisations, some examples
- lang. extensions like declaring preferred branches (in Linux kernel some branches are marked as almost always executed)
- implementation of throwing and catching exceptions
- co-routines implementation and dependency information
- optimisations that grow exponentially (like loop-unswitch grow code-size), might need to be applied to specific places according to high level info. - might be from source code (front-end).
- language features (it might be reflection or sth else) that is translated into "many-pointers" (like pointers to pointers...) interlinked structures, that might be hard to guess on low level - as on low level, all might look like array access, while it might have some constrains on hight level that might help in optimisations.
- complex functions might be implemented differently depending on available hardware. Let's take a few examples: matrix multiplication, FFT transformation (compression&decompression algorithms), big-numbers arithmetic etc etc... depending on underlying hardware it might be implemented differently to achieve maximum performance. After translating stuff into LLVM it might be very-very-very costly (in terms in computation complexity) to change implementation with more appropriate to available hardware. That's, why decision should be made by front-end when compiling into lower-level.
Those are just a few ideas, a hope, showing kind of optimisations that might be involved.
精彩评论