开发者

What's the best way to measure and track performance over various calls at runtime?

I'm trying to optimize the performance of my code, but I'm not familiar with xcode's debuggers or debuggers in general. Is it possible to track the execution time and frequency of calls being made at runtime?

Imagine a chain of events with some 开发者_如何学Pythonrecursive calls over a fraction of a second. What's the best way to track where the CPU spends most of its time?

Many thanks.

Edit: Maybe this is better asked by saying, how do I use the xcode debug tools to do a stack trace?


You want to use the built-in performance tools called 'Instruments', check out Apples guide to Instruments. Specifically you probably want the System Instruments. There's also the Tuning Guide which could be useful to you and Shark.


Imagine a chain of events with some recursive calls over a fraction of a second. What's the best way to track where the CPU spends most of its time?

Short version of previous answer.

  1. Learn an IDE or debugger. Make sure it has a "pause" button or you can interrupt it when your program is running and taking too long.

  2. If your code runs too quickly to be manually paused, wrap a temporary loop of 10 to 1000 times around it.

  3. When you pause it, make a copy of the call stack, into some text editor. Repeat several times.

Your answer will be in those stacks. If the CPU is spending most of its time in a statement, that statement will be at the bottom of most of the stack samples. If there is some function call that causes most of the time to be used, that function call will be on most of the stacks. It doesn't matter if it's recursive - that just means it shows up more than once on a stack.

Don't think about measuring microseconds, or counting calls. Think about "percent of time active". That's what stack samples tell you, and that's roughly what you'll save if you fix it.

It's that simple.

BTW, when you fix that problem, you will get a speedup factor. Then, other issues in your code will be magnified by that factor, so they will be easier to find. This way, you can keep going until you've squeezed every cycle out of it.


The first thing I tell people is to recognize the difference between

1) timing routines and counting how many times they are called, and

2) finding code that you can fruitfully optimize.

For (1) there are instrumenting profilers. To be really successful at (2) you need a rare type of profiler. You need a sampling profiler that

  • samples the entire call stack, not just the program counter

  • samples at random wall clock times, not just CPU, so as to capture possible I/O problems

  • samples when you want it to (not when waiting for user input)

  • for output, gives you, for each line of code that appears on stack samples, the percent of samples containing that line. That is a direct measure of the total time that could be saved if that line were not there.

(I actually do it by hand, interrupting the program under the debugger.)

Don't get sidetracked by problems you don't have, such as

  • accuracy of measurement. If a line of code appears on 30% of call stack samples, it's actual cost could be anywhere in a range around 30%. If you can find a way to eliminate it or invoke it a lot less, you will save what it costs, even if you don't know in advance exactly what its cost is.

  • efficiency of sampling. Since you don't need accuracy of time measurement, you don't need a large number of samples. Even if you get a large number of samples, they don't skew the results significantly, because they don't fail to spot the costly lines of code.

  • call graphs. They make nice graphics, but are not what you need to know. An arc on a call graph corresponds to a line of code in the best case, usually multiple lines, so knowing cost of an arc only tells the cost of a line in the best case. Call graphs concentrate on functions, when what you need to find is lines of code. Call graphs get wrapped up in the issue of recursion, which is irrelevant.

It's important to understand what to expect. Many programmers, using traditional profilers, can get a 20% improvement, consider that terrific, count the profiler a winner, and stop there. Others, working with large programs, can often get speedup factors of 20 times. This is done by fixing a series of problems, each one giving a multiplicative speedup factor. As soon as the profiler fails to find the next problem, the process stops. That's why "good enough" isn't good enough.

Here is a brief explanation of the method.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜