开发者

Use callgrind as a sampling profiler?

I've been searching for a Linux sampling profiler, and callgrind has come the closest to showing useful results. However the overhead is estimated at 20--100x slower than normal. Additionally, I'm only interested in time spent per fu开发者_开发知识库nction (with particular emphasis on blocking calls such as read() and write(), which no other profiler will faithfully display).

  1. Is there a way to turn off excess options, so that just the minimum data is recorded for generating times spent in various call stacks?
  2. Does callgrind's cachegrind heritage imply that excess stuff is being done with regards to cache profiling etc?
  3. I assume callgrind operates like a debugger. Can this be adjusted to sample the process at intervals, rather than every single instruction?


3) Callgrind is working like dynamic translator, which instruments orginal code with counting instrument code. Instrumenting is done for each memory access instruction in the code (for cache simulation), and (i suggest) for each jmp-like instruction to track exec. count of every basic block.

I have a small sampling profiler, which acts just like debugger; It does inject a setitimer() profiling counter into the application and then it does intercept all SIGALRM and prints current $eip value.

There were some sampling profilers with setitimer approach earlier, also there is a profil()for something like. This is used by glibc/gmon/gmon.c and gprof -p (to be exact, by gcc -pg). profil() function is able to profile single contonous code fragment with sampling a virtual cpu time each 1 or 10 millisecond. There is also sprofil() function.

Check also LD_PRELOAD=/lib/libpcprofile.so PCPROFILE_OUTPUT=output.file - but I don't know does it work or how it work

For numbered questions:

2) "Callgrind is an extension to Cachegrind. It provides all the information that Cachegrind does, plus extra information about callgraphs." - So it can provide any stuff that is in cachegrind, but also it allow user to turn off cache simulation: --simulate-cache=no (it is the default value)

For speed: According to http://www.valgrind.org/docs/manual/nl-manual.html - manual of Nul valgrind tool (aka nulgrind), which does no additional instrumentation, slowdown is 5 times. It is because program is dynamically translated by valgrind itself. So, there can be no tool for valgrind, which can work faster then nulgrind.


Have you tried gprof ? It does not have the big overhead as valgrind do.


Try using Zoom from RotateRight. It has a "Thread Time" configuration that samples all threads in a single process whether they are running or blocked.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜