Is there any special C-level programming technique for ARM (EABI) architecture?

2023-02-07 00:11 问答作者：

I am interested in any advice about special C-programming techniques for ARM-CPU开发者_运维技巧 targets using GCC with EABI. My applications contain floating-point intensive calculations on large data arrays. The major goal is to get the fastest executable. Mostly, I use codesourcery and android-ndk's arm-eabi-gcc versions. I also don't want to use neon-intrinsics or do any changes in C-code, which are incompatible with other compilers for other architectures (like Intel compiler for IA32).

Since most ARM targets do not have an FPU, if you want the "fastest executable", you should consider using a fixed point library. This Dr. Dobb's article: Optimizing Math-Intensive Applications with Fixed-Point Arithmetic has a good explanation of CORDIC algorithms and provides complete source code for the library discussed in the article. The article is exactly about accelerating math intensive code on ARM devices without an FPU. The reported results were typically a 4x acceleration over a floating point implementation, which given that a VFP without use of vectorization (which the compiler is unlikely to support except through library code) gives a 5x acceleration, is pretty good for a software implementation.

Note: I have used this library and found an error in the log() function. This is corrected by adding a 0x0LL to the end of the log_two_power_n_reversed[] array initialiser. I have confirmed this correction with the author. The link to the code in the article is broken, find it at: ftp://ftp.drdobbs.com/sourcecode/ddj/2008/0804.zip

[EDIT] Oops, sorry the article and code discusses a C++ implementation, using operator and function overloading extensively to make use of the fixed type as transparent as possible. A good reason to use C++ compilation perhaps, but not what you asked for.

If you want to maintain portability, my advice is "don't use floating point". Most ARM chips do not have an FPU and will have to emulate the operations in software.

In general, benchmark, change, and benchmark again. Any performance optimisation without thorough before/after performance measurements is futile.

To maximize performance on a CPU without an FPU, should should also choose "soft floating point" instead of "hard floating point". This way, your executable will be linked against floating point libraries instead of relying on the kernel trapping illegal instructions and emulating them in the kernel (which takes more time because of the context switching involved).

Of course, if you have a CPU with a hardware floating point unit, you should use hard floating point to make use of it.

Luckily, the EABI allows to have both types of executables and libraries coexist peacefully.

Alas, for best number crunching performance across all the ARM processors in current use for smartphones, the best method may be to do something completely different for each ARM architecture: scaled integer arithmetic or slower soft-float for fpu-less chips, overlappable floating point for chips with pipelined VFP hardware, and parallelized non-portable NEON intrinsics for NEON capable chips. You may have to code all of these, and use run-time compute routine selection after detecting the CPU architecture.

继续阅读：android arm c gcc signal-processing

Is there any special C-level programming technique for ARM (EABI) architecture?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？