Ways to corner a stickiness bug
How to determine exactly what a piece of software is doing when it is stuck, unresponsive开发者_开发问答 to user input and not updating its display?
I have tried oprofile, which records what function is executing, but it's not giving me enough clues. It counts everything that happens during the time it's running, when I need to see what's happening only when the specimen program is stuck.
The problem might involve interrupts, waiting on network sockets, timers, a GUI event handler, or who knows what. How to find out as much as possible about what's going on, not just the execution points of each thread?
The soffware of interest runs on Linux, built using gcc, mostly C++ but may involve other languages including interpreted ones e.g. Python.
The particular case of concern now is Firefox, for which I have checked out source. Firefox pauses all input and screen output at random times, frequently, for about 5-10 seconds each time. Even if someone handed me the solution to this particular problem on a silver platter, sure I'll take it but still be asking. If possible, I'd like to learn general techniques that would apply to any software, especially stuff I'm responsible for.
strace will trace out the system calls. This might give some indication of what is blocking on network sockets and so on.
This technique should find it. Basically, while it's spending time like that, there's almost always a hierarchy of function calls on the stack waiting for their work to be completed. Just sample the stack a few times and you'll see them.
ADDED: As Don Wakefield pointed out, the pstack utility could be perfect for this job.
A stack trace can be obtained of a running program. At a command line, use "ps aux" to find the program's PID. Suppose it's 12345. Then run:
gdb ---pid=12345
When the program is stuck in a pause (or when doing anything suspicious), do a ctrl-C in gdb. The "bt" command in gdb prints the stack, which can be admired now or pasted into a text file for later study. Resume execution of the program with "c" (continue).
The main advantage of this manual technique over using oprofile or other profilers, is I can get the exact call sequence during a moment of interest. A few samples during times of trouble, and a few when the program is running normally, should give useful clues.
精彩评论