Timings reported by profiler vs true timings - why the discrepancy?
I have two chunks of code that do the same operation. One chunk written by myself, the other written by a third party. They are both compiled into a 开发者_C百科single executable. The third party code appears to be able to do its job much faster than mine. It can perform 1,500 operations per second compared to my 500. I then ran the executable within VTune, employing the callgraph profiling option, hoping this would reveal where I was wasting time. Unfortunately the VTune diagnostics, which shows the number of microseconds it thinks each function takes, claims that both my function and the third party function are taking about 0.002 seconds per call. That's appears spot on for my code but is completely at odds with my (manual) measurement of the speed of the third party code.
How can this happen?
EDIT: both chunks of code are large and call their own complex trees of sub functions.
EDIT: I should point out that the third party code is pure C++ whereas my code is essentially C code that has just been compiled in a C++ compiler.
EDIT: VTune is a very complex package with loads of configuration options I don't understand. Might there be some settings to play with that may reduce this inaccuracy?
Your definition of 'true timings' might need revision. You cannot claim that the profiler is wrong when comparing apples and pears.
Profilers can be used for relative timing; use a profiler to find the 'hot-spot' in your code, then use the information to optimize that area.
On a practical note: look for a sampling profiler, which usually has much less overhead/impact than a tracing/instrumenting profiler
(PS Also read up on Schrodinger/Heisenberg)
I've seen cases where profilers artificially inflate the reported time for certain functions/system calls. It could be that th 3rd party library is using some such call and getting pegged for it.
Have you tried using the high performance clock (gethrtime
in Solaris or QueryPerformanceCounter
in Windows) and measuring the total times of the functions as a sanity check?
Your operations sound really slow to be CPU bound - are they I/O bound? Is your I/O code less optimized than the library's? That wouldn't necessarily show up in a CPU profile report at all.
If you are using wall time (i.e., elapsed seconds instead of CPU counters), you also need to account for time spent in blocking system calls. For example, assuming you aren't doing much file I/O, you are probably spending a lot of time printing information to the console. Console I/O will not show up as CPU time since most of that time is simply waiting to update the console.
You can use GetThreadTimes(...)
to determine how much time you are spending in your code vs. system code. I have used this and the system call sampling to reduce context switches (and ultimately increase overall performance).
精彩评论