Multithread debugging techniques
I was wondering if anyone knows of a nice survey of debugging techniques for multithreaded applications. Ideally, I'm looking for a case-based analysis: deadlocks, starvation, corrupted shared state, ...
.Net specific, 开发者_运维知识库or generic.
I'm not aware of an article or book that addresses what you're looking for, so here's my "lessons learned" from 12 years of multithreaded debugging on Windows (both unmanaged and managed).
As I stated in my comment, most of my "multithreaded debugging" is actually done via a manual code review, looking for these issues.
Deadlocks and Corrupted Shared State
Document lock hierarchies (both the order and what shared state they protect), and ensure they're consistent. This solves most deadlock problems and corrupted shared state problems.
(Note: the link above for "lock hierarchies" refers to a Dr. Dobbs article by Herb Sutter; he's written a whole series of Effective Concurrency articles that I highly recommend).
More on Deadlocks
Use RAII for all synchronization. This ensures that locks are released in the face of exceptions. Prefer the "lock" statement to try/finally.
(Note that RAII in .NET depends on IDisposable
, not Finalize
, and assumes that the client code will correctly use a using
block).
Starvation
Remove any modifications of thread priorities. Correct prioritization is actually a bit counter-intuitive: it is best to give the thread with the most work to do a lower priority, and give higher priorities to threads that are I/O bound (including the UI thread). Since Windows does this automatically (see Windows Internals), there's really no reason for the code to get involved at all.
In General
Remove all lock-free code that was written in-house. It almost certainly contains subtle bugs. Replace it with .NET 4 lock-free collections and synchronization objects, or change the code to be lock-based.
Use higher-level concepts for synchronization. The Task Parallel Library and unified cancellation in .NET 4 remove pretty much any need for direct usage of ManualResetEvent
, Monitor
, Semaphore
, etc.
Use higher-level concepts for parallelization. The TPL and PLINQ in .NET 4 have built-in self-balancing algorithms complete with intelligent partitioning and work-stealing queues to provide optimum parallelization automatically. For the few rare cases that the automatic parallelization is sub-optimal, both TPL and PLINQ expose a huge number of tweakable knobs (custom partitioning schemes, long-running operation flags, etc).
There is one more technique I've found useful for any class that has its methods called by different threads: document which methods run on which threads. Usually, this is added as a comment to the top of the method. Ensure each method only runs in a known thread context (e.g., "on a UI thread" or "on a ThreadPool thread" or "on the dedicated background thread"). None of the methods should say "on any thread" unless you're writing a synchronization class (and if you're writing a synchronization class, ask yourself if you really should be doing that).
Lastly, name your threads. This helps easily distinguish them when using the VS debugger. .NET supports this via the Thread.Name
property.
Not what you are asking for but maybe you find CHESS interesting.
You could also take a look at Intel's Thread Checker or Thread Profiler and Sun's Studio Thread Analyzer, though they are not free. Also check out this article from Intel.
I've used Helgrind a subtool of Valgrind. Helgrind is a thread error detector and I've used it once or twice to detect race conditions in some of my code. It can detect the following things.
- Misuses of the POSIX pthreads API.
- Potential deadlocks arising from lock ordering problems.
- Data races -- accessing memory without adequate locking or synchronisation.
http://valgrind.org/docs/manual/hg-manual.html
Obviously only linux tool for system programs, C / C++. No Java or .NET.
I don't think that any technique can reliably detect all multithreading problems, because the code causing them is just too complicated to analyse. No tool can detect such problems in real time neither, because the tool itself needs also time to run. The program to debug would behave completely different with the tool and without.
I had to debug real time problems which occured in production only once a month ! The only solution I found is to add code detecting that problem and to write trace information by the threads involved. Of course, the tracing must be EXTREMELY fast and non blocking. Usual tools like Visual Studio are way too slow for real time tracing, but luckily, it is easy to write your own memory trace:
const int maxMessages = 0x100;
const int indexMask = maxMessages-1;
string[] messages = new string[maxMessages];
int messagesIndex = -1;
public void Trace(string message) {
int thisIndex = Interlocked.Increment(ref messagesIndex) & indexMask;
messages[thisIndex] = message;
}
A more detailed description for this approach which also collects thread and timing information and outputs the trace nicely is at: CodeProject: Debugging multithreaded code in real time 1
精彩评论