Reducing performance variations on Linux

2022-12-17 08:11 问答作者：

I am trying to benchmark a piece of software that runs on an Intel Pentium with Linux on top of it. The problem is, that I get considerable performance variations during consecutive test runs, when using the RDTSC instruction. Runtimes of exactly the same piece of software vary between 5 million and 10 million clock cycles, so in the worst case scenario I have an overhead of 100%. I am aware that there are performance variations caused by cache contention, however, is there maybe I way that I can eliminate other poten开发者_如何学JAVAtial problems like interrupts, other processes etc.?

Would be thankful for any useful tips how to do this properly.

Many thanks, Kenny

Common problems in this general area are:

process migration in multi-CPU/multi-core systems
RDTSC not consistent across cores in multi-CPU/multi-core systems
other processes taking CPU time (also interrupts, I/O, screen activity, etc)
automatic CPU clock frequency scaling
VM page faults etc

Solutions:

If you're running a single threaded process on a multi-CPU/multi-core systems then use CPU affinity to lock the process to a specific core. (Use taskset from the command line or call sched_setaffinity() from within you code.)
make sure you have no other processes taking CPU time, disable screen savers or other desktop animations and make sure there are no screen updates while your code is running. Also don't use e.g. printf to a GUI console window during your code timing - save any results output until after you've collected your last timestamp. (If possible you could even consider killing the GUI completely.)
Use a more reliable timing method than RDTSC (I typically use clock_gettime(CLOCK_PROCESS_CPUTIME_ID, ...) on Linux).
Disable automatic clock frequency scaling (e.g. Linux: cpufreq-set)
Run your code in a loop, for say N repeats, preferably re-using the same memory allocations for any large data structures (to get rid of the effects of VM page faults etc). Ignore the first measurement and average the remaining N - 1 measurements.

Some general things: raise the test process priority (man 1 nice), stop as many other process as possible, unload unused kernel modules, flush disk caches (so that background kernel threads have less work), reboot in the single-user mode?

The best way to reduce variations caused by the system environment would be running your benchmark in "single user" mode, also known as initlevel 1, or "recovery mode".

You can boot into this mode by passing "-s" as a boot time option to the kernel, or you can switch a running system to it with "init 1".

In this mode, all daemons are stopped, and you are logged in as root. Pretty much anything that runs on the system runs from your interactive terminal.

Please make sure you deactivate frequency scaling in the BIOS and the operating system. Also it sounds like you are using a P4, so make sure you turn off hyperthreading.

I have encountered performance variations like you describe in the past, due to such things.

This page describes how to turn it on, which which should give you what you need to turn it off.

You will also need to reboot your machine and look in the bios settings to determine if it is doing it automatically, without the operating system knowing.

Have you considered running the code inside valgrinds cachegrind or callgrind tools? These should be able to provide you with accurate instruction counts by running the code through valgrinds "VM".

继续阅读：benchmarking performance

Reducing performance variations on Linux

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？