Optmization flags with g++

2023-01-13 08:35 问答作者：

I am using g++ to compile a C++ code; a scientific simulation software.

Currently I am using the -O3 and -funroll-loops flags. I could notice a big difference between -O0, -O1, -O2, and -O3, and almost no difference with -funroll-loops.

Would you have any suggestions to help me to increase the optimization or tricks that I can use to get even better performances ?

Than开发者_如何转开发ks !

Edit, as suggested in the comments: I am asking here about 'pure' compiling optimization, ie. is there clever things to do than just -O3. The computing intensive part of the code deals with manipulation of blitz::array in huge loops.

Edit2: I actually deal with a lot of fp (double) math

Without seeing the code, we can only give you generic advice that applies to a broad range of problems.

Try GCC's profile-guided optimisation. Compile instrumented code with -fprofile-generate, do a few test runs with a realistic workload, then use the output from test run when building final binary (-fprofile-use). Then GCC can guess better which branches are taken and optimise code better.
Try to parallelize your code if you can. You mentioned you have loops over big data items, this may work if your work items are independent and you can partition them. E. g. have a work queue with a worker thread pool with size equal to the number of CPUs and dispatch work to the queue instead of processing sequentially, then pool threads will grab work items off the queue and process them in parallel.
Look at the size of the data units your code works with and try to fit them in as few L1 cache line (usually 64 bytes). For example if you have 66-byte data items and your cache line size is 64 bytes, it may be worth packing the structure, or otherwise squeezing it to fit in 64 bytes.

It's hard to tell without knowing the code you want to accelerate. Also, knowing the code may allow us to make improvements to it, to make it faster.

As a general advice, try specifying the -march option to tell GCC what CPU model are you targeting. You can try -fomit-frame-pointer if you make many function calls (esp. recursive). If you use heavily floating point math, and stay away from corner cases (e.g. NaNs, FP exceptions), you can try -ffast-math. The last one may buy you a huge speedup, but in some cases it can bring wrong results. Analyze your code to ensure it is safe.

I don't have enough mojo to comment or to edit Alex B's answer so I will answer instead.

After you turn on profiling and run your application per Alex B's answer, actually look at the profile information to look for hot spots where your application spends most of its time. If you find any, take a look at the code to see what you can do to make them less hot.

Appropriate algorithm replacement will generally outperform any automated optimization by a wide margin.

继续阅读：g++gcc

Optmization flags with g++

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？