Outliers during Performance Evaluation
I am trying to do some performance measurements using Intels RDTSC, and it is quite odd the variations I get during differen开发者_如何学编程t testruns. In most cases my benchmark in C needs 3000000 Mio cycles, however, exactly the same execution can in some cases take 5000000, almost double as much. I tried to have no intense workloads running in parallel so that I get good performance estimations. Anyone an idea where this huge timing variations can come from? I know that interrupts and stuff can happening, but I did not expect such huge variations in timing!
PS.: I am running it on a Pentium processor with Linux running on it.
Thanks for feedback, John
I think the answer is in:
I tried to have no intense workloads running in parallel
You have insufficient control over this in a modern OS.
According to this Wikipedia article, the RDTSC (time stamp counter) cannot be used reliably for benchmarking on multi-core systems. There is no promise that all cores have the same value in the time stamp register.
On Linux, it is better to use the POSIX clock_gettime
function.
You have to take the cache of most modern processors into account. Maybe another process evicts your program's cache content in the case where you measured the long running time. As Henk pointed out, lots of stuff happen in a modern OS that you can't control that much.
精彩评论