Is there any special C-level programming technique for ARM (EABI) architecture?
I am interested in any advice about special C-programming techniques for ARM-CPU开发者_运维技巧 targets using GCC with EABI. My applications contain floating-point intensive calculations on large data arrays. The major goal is to get the fastest executable. Mostly, I use codesourcery and android-ndk's arm-eabi-gcc versions. I also don't want to use neon-intrinsics or do any changes in C-code, which are incompatible with other compilers for other architectures (like Intel compiler for IA32).
Since most ARM targets do not have an FPU, if you want the "fastest executable", you should consider using a fixed point library. This Dr. Dobb's article: Optimizing Math-Intensive Applications with Fixed-Point Arithmetic has a good explanation of CORDIC algorithms and provides complete source code for the library discussed in the article. The article is exactly about accelerating math intensive code on ARM devices without an FPU. The reported results were typically a 4x acceleration over a floating point implementation, which given that a VFP without use of vectorization (which the compiler is unlikely to support except through library code) gives a 5x acceleration, is pretty good for a software implementation.
Note: I have used this library and found an error in the log() function. This is corrected by adding a 0x0LL to the end of the log_two_power_n_reversed[] array initialiser. I have confirmed this correction with the author. The link to the code in the article is broken, find it at: ftp://ftp.drdobbs.com/sourcecode/ddj/2008/0804.zip
[EDIT] Oops, sorry the article and code discusses a C++ implementation, using operator and function overloading extensively to make use of the fixed
type as transparent as possible. A good reason to use C++ compilation perhaps, but not what you asked for.
If you want to maintain portability, my advice is "don't use floating point". Most ARM chips do not have an FPU and will have to emulate the operations in software.
In general, benchmark, change, and benchmark again. Any performance optimisation without thorough before/after performance measurements is futile.
To maximize performance on a CPU without an FPU, should should also choose "soft floating point" instead of "hard floating point". This way, your executable will be linked against floating point libraries instead of relying on the kernel trapping illegal instructions and emulating them in the kernel (which takes more time because of the context switching involved).
Of course, if you have a CPU with a hardware floating point unit, you should use hard floating point to make use of it.
Luckily, the EABI allows to have both types of executables and libraries coexist peacefully.
Alas, for best number crunching performance across all the ARM processors in current use for smartphones, the best method may be to do something completely different for each ARM architecture: scaled integer arithmetic or slower soft-float for fpu-less chips, overlappable floating point for chips with pipelined VFP hardware, and parallelized non-portable NEON intrinsics for NEON capable chips. You may have to code all of these, and use run-time compute routine selection after detecting the CPU architecture.
精彩评论