How to modify a C program so that gprof can profile it?

2022-12-12 18:13 问答作者：

When I run gprof on my C program it says no time accumu开发者_C百科lated for my program and shows 0 time for all function calls. However it does count the function calls.

How do I modify my program so that gprof will be able to count how much time something takes to run?

Did you specify -pg when compiling?

http://sourceware.org/binutils/docs-2.20/gprof/Compiling.html#Compiling

Once it is compiled, you run the program and then run gprof on the binary.

E.g.:

test.c:

#include <stdio.h>

int main ()
{
    int i;
    for (i = 0; i < 10000; i++) {
        printf ("%d\n", i);
    }
    return 0;
}

Compile as cc -pg test.c, then run as a.out, then gprof a.out, gives me

granularity: each sample hit covers 4 byte(s) for 1.47% of 0.03 seconds

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 45.6       0.02     0.02    10000     0.00     0.00  __sys_write [10]
 45.6       0.03     0.02        0  100.00%           .mcount (26)
  2.9       0.03     0.00    20000     0.00     0.00  __sfvwrite [6]
  1.5       0.03     0.00    20000     0.00     0.00  memchr [11]
  1.5       0.03     0.00    10000     0.00     0.00  __ultoa [12]
  1.5       0.03     0.00    10000     0.00     0.00  _swrite [9]
  1.5       0.03     0.00    10000     0.00     0.00  vfprintf [2]

What are you getting?

I tried running Kinopiko's example, except I increased the number of iterations by a factor of 100.

test.c:

#include <stdio.h>

int main ()
{
    int i;
    for (i = 0; i < 1000000; i++) {
        printf ("%d\n", i);
    }
    return 0;
}

Then I took 10 stackshots (under VC, but you can use pstack). Here are the stackshots:

9 copies of this stack:
NTDLL! 7c90e514()
KERNEL32! 7c81cbfe()
KERNEL32! 7c81cc75()
KERNEL32! 7c81cc89()
_write() line 168 + 57 bytes
_flush() line 162 + 23 bytes
_ftbuf() line 171 + 9 bytes
printf() line 62 + 14 bytes
main() line 7 + 14 bytes
mainCRTStartup() line 206 + 25 bytes
KERNEL32! 7c817077()

1 copy of this stack:
KERNEL32! 7c81cb96()
KERNEL32! 7c81cc75()
KERNEL32! 7c81cc89()
_write() line 168 + 57 bytes
_flush() line 162 + 23 bytes
_ftbuf() line 171 + 9 bytes
printf() line 62 + 14 bytes
main() line 7 + 14 bytes
mainCRTStartup() line 206 + 25 bytes
KERNEL32! 7c817077()

In case it isn't obvious, this tells you that:

mainCRTStartup() line 206 + 25 bytes Cost ~100% of the time
main() line 7 + 14 bytes             Cost ~100% of the time
printf() line 62 + 14 bytes          Cost ~100% of the time
_ftbuf() line 171 + 9 bytes          Cost ~100% of the time
_flush() line 162 + 23 bytes         Cost ~100% of the time
_write() line 168 + 57 bytes         Cost ~100% of the time

In a nutshell, the program spends ~100% of it's time flushing to disk (or console) the output buffer as part of the printf on line 7.

(What I mean by "Cost of a line" is - it is the fraction of total time spent at the request of that line, and that's roughly the fraction of samples that contain it. If that line could be made to take no time, such as by removing it, skipping over it, or passing its work off to an infinitely fast coprocessor, that time fraction is how much the total time would shrink. So if the execution of any of these lines of code could be avoided, time would shrink by somewhere in the range of 95% to 100%. If you were to ask "What about recursion?", the answer is It Makes No Difference.)

Now, maybe you want to know something else, like how much time is spent in the loop, for example. To find that out, remove the printf because it's hogging all the time. Maybe you want to know what % of time is spent purely in CPU time, not in system calls. To get that, just throw away any stackshots that don't end in your code.

The point I'm trying to make is if you're looking for things you can fix to make the code run faster, the data gprof gives you, even if you understand it, is almost useless. By comparison, if there is some of your code that is causing more wall-clock time to be spent than you would like, stackshots will pinpoint it.

One gotcha with gprof: it doesn't work with code in dynamically-linked libraries. For that, you need to use sprof. See this answer: gprof : How to generate call graph for functions in shared library that is linked to main program

First compile you application with -g, and check what CPU counters are you using. If your application runs very quick than gprof could just miss all events or less that required (reduce the number of events to read).

Actually profiling should show you CPU_CLK_UNHALTED or INST_RETIRED events without any special switches. But with such data you'll be able only to say how well your code it performing: INST_RETIRED/CPU_CLK_UNHALTED.

Try to use Intel VTune profiler - it's free for 30 days and for education.

继续阅读：c gprof profiler profiling

How to modify a C program so that gprof can profile it?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？