Timing and Benchmarking
If I want to compare the speed of two algorithms, what's the best way? I am familiar with the mathematical way, and I know I don't full master it yet =/ I'm interested in knowing how to time two algorithms with good precision on Windows. What are the best APIs to use, is time() from stdio.h reliable or do I need something 开发者_运维技巧better?
An example would be good too! Thank you!
Unfortunately this is not such an easy task.
As previously mentioned QueryPerformanceCounter is a viable choice.
Other possibilities are:
- GetTickCount (not advisable, since its accuracy is worse than 30ms)
- timeGetTime: By default it has a 15ms accuracy. This corresponds to the default time allocated by the task scheduler (15ms on my computer). You can force timeGetTime to be more accurate, by changing a system wide setting that rules the time allocated by the scheduler : a call to timeBeginPeriod can do this. However, this should only be used as a temporary system hack! Please do not use it in your release code!
- Query the processor's time stamp counter : this requires assembler programming and I would not recommend it.
As far as QueryPerformanceCounter is concerned, you can find an easy to use wrapper here : http://www.codeproject.com/KB/datetime/perftimer.aspx
You can use it this way :
CPerfTimer t;
t.Start();
CallExpensiveTask();
std::cout << "Time (ms) " << t.Elapsedms();
A few advices however :
- As mentionned, run your function a lot of times in order to get a reliable measure
- Beware of the task scheduler : On an average computer each process is given 15ms before switching to another process. If the task scheduler switches task during the call to your measured function, you might measure a time that is much higher (around 15ms higher)
- Note that sleep(1) yields to a 15ms pause (since the scheduler will switch to a different process)
- Remember that QueryPerformanceCounter can (rarely) give inaccurate results : On multicore processors, you might sometimes see a negative time lapse (!). In that case, you should redo your measure (see http://www.virtualdub.org/blog/pivot/entry.php?id=106 )
- Temporary solutions to counteract the scheduler effect: - Raise your process priority (you can do it through the task manager) - Hack the scheduler : the timeBeginPeriod (http://msdn.microsoft.com/en-us/library/dd757624(v=VS.85).aspx) provided by microsoft has the power to change the time allocated to each task from 15ms to lower values (remember to not include this is your release code, since this is a system wide setting that can reduce the global performance...)
Some more notes about the processor's time stamp counter
- I did not test its accuracy myself, but http://en.wikipedia.org/wiki/Time_Stamp_Counter is a good source of information about it, and about its limitations (esp. on multicore processors, and also on processors with variable clock frequencies)
- AFAIK, it is advisable to use this kind of timer on a single thread for which you set the processor affinity
- An example of implementation (found at http://developer.nvidia.com/object/timer_function_performance.html) could be:
#pragma warning (disable : 4035) // disable no return value warning
__forceinline DWORD GetPentiumCounter()
{
__asm
{
xor eax,eax // VC won't realize that eax is modified w/out this
// instruction to modify the val.
// Problem shows up in release mode builds
_emit 0x0F // Pentium high-freq counter to edx;eax
_emit 0x31 // only care about low 32 bits in eax
xor edx,edx // so VC gets that edx is modified
}
}
#pragma warning (pop)
What you are looking for is QueryPerformanceCounter
and QueryPerformanceFrequency
. These are Windows API functions to access high resolution timers. They should yield better precision then time
or GetSystemTime
.
LARGE_INTEGER time1;
QueryPerformanceCounter(&time1);
// Your code
LARGE_INTEGER time2;
QueryPerformanceCounter(&time2);
LARGE_INTEGER ticksPerSecond;
QueryPerformanceFrequency(&ticksPerSecond);
double seconds = (double)(time2.QuadPart - time1.QuadPart) / ticksPerSecond.QuadPart;
And here I found an implementation of a CStopWatch
class that could be used as an example.
QueryPerformanceCounter/QueryPerformanceFrequency. AFAIK, they are fixed now, and actually measure cycles. Just remember to rise the priority of the thread so that it would have less impact from other threads while you measure.
Preempting can cause quite a bit of error if the runtime of the measurable code is low.
So rise priority and rerun tests multiple times.
精彩评论