Building your own profiler: how to catch events?
I couldn't really get an answer to this question, so I'll attempt to write a custom, although 开发者_如何转开发simple, profiler. Just to get started: suppose I need to find out, without recompiling, how much (and which) core is running my code. Suppose also I'd like to catch when a given function is executed. Finally, any thoughts about dealing with threads? Any other tips as to how to start? C is my language of choice, and I'm running Linux. Thanks.
Edit: Oprofile, CallGrind, Helgrind, gprof, papi, tau, and others I've analyzed seem not to match my needs.
I'm sure you've seen this before.
I find it helpful to distinguish two different objectives:
Measuring how long various things take, so you can make a presentation. As part of this presentation you might say something like "It looks like the frob routine is taking too much time, or being called too many times, suggesting we try to speed that up or call it less."
Pinpointing precise lines of code or instructions that are 1) not necessary, and 2) worth fixing, in the sense that they will save a good fraction of execution time.
I suspect the overall goal is the latter. But to do that, measuring is a very indirect approach. Instead, you could take advantage of the fact that, if something's wasting enough time to be worth looking at, you can simply catch it by taking snapshots of the program's state.
So you're not measuring in order to find what's taking time. The very fact that it takes time is what exposes it, unambiguously, with no suggesting involved.
Zoom is a profiler that works this way. So is LTProf. I built one once, but frankly I think the manual method, while more work, is more effective, because it makes me think harder about why the program's doing what it's doing.
You should try linux's perf https://perf.wiki.kernel.org/index.php/Tutorial
This tool has direct support from kernel and knows about page-faults, CPU-migrations, context-switches (e.g. look at perf stat
output). This stats can be aggregated per-process or per-cpu. perf record
can be used like oprofile.
For adding your simple profiling you can use setitimer
(the sampling signal is process-wide) or timer_create
(timer signal can be installed for thread). You can't directly get information about physical cpu number used by thread, but at every sample you can per-thread run times with getrusage
with RUSAGE_THREAD
.
精彩评论