开发者

Reducing performance variations on Linux

I am trying to benchmark a piece of software that runs on an Intel Pentium with Linux on top of it. The problem is, that I get considerable performance variations during consecutive test runs, when using the RDTSC instruction. Runtimes of exactly the same piece of software vary between 5 million and 10 million clock cycles, so in the worst case scenario I have an overhead of 100%. I am aware that there are performance variations caused by cache contention, however, is there maybe I way that I can eliminate other poten开发者_如何学JAVAtial problems like interrupts, other processes etc.?

Would be thankful for any useful tips how to do this properly.

Many thanks, Kenny


Common problems in this general area are:

  • process migration in multi-CPU/multi-core systems
  • RDTSC not consistent across cores in multi-CPU/multi-core systems
  • other processes taking CPU time (also interrupts, I/O, screen activity, etc)
  • automatic CPU clock frequency scaling
  • VM page faults etc

Solutions:

  • If you're running a single threaded process on a multi-CPU/multi-core systems then use CPU affinity to lock the process to a specific core. (Use taskset from the command line or call sched_setaffinity() from within you code.)

  • make sure you have no other processes taking CPU time, disable screen savers or other desktop animations and make sure there are no screen updates while your code is running. Also don't use e.g. printf to a GUI console window during your code timing - save any results output until after you've collected your last timestamp. (If possible you could even consider killing the GUI completely.)

  • Use a more reliable timing method than RDTSC (I typically use clock_gettime(CLOCK_PROCESS_CPUTIME_ID, ...) on Linux).

  • Disable automatic clock frequency scaling (e.g. Linux: cpufreq-set)

  • Run your code in a loop, for say N repeats, preferably re-using the same memory allocations for any large data structures (to get rid of the effects of VM page faults etc). Ignore the first measurement and average the remaining N - 1 measurements.


Some general things: raise the test process priority (man 1 nice), stop as many other process as possible, unload unused kernel modules, flush disk caches (so that background kernel threads have less work), reboot in the single-user mode?


The best way to reduce variations caused by the system environment would be running your benchmark in "single user" mode, also known as initlevel 1, or "recovery mode".

You can boot into this mode by passing "-s" as a boot time option to the kernel, or you can switch a running system to it with "init 1".

In this mode, all daemons are stopped, and you are logged in as root. Pretty much anything that runs on the system runs from your interactive terminal.


Please make sure you deactivate frequency scaling in the BIOS and the operating system. Also it sounds like you are using a P4, so make sure you turn off hyperthreading.

I have encountered performance variations like you describe in the past, due to such things.

This page describes how to turn it on, which which should give you what you need to turn it off.

You will also need to reboot your machine and look in the bios settings to determine if it is doing it automatically, without the operating system knowing.


Have you considered running the code inside valgrinds cachegrind or callgrind tools? These should be able to provide you with accurate instruction counts by running the code through valgrinds "VM".

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜