Core dump of multithreaded application shows only one thread
I have a test applicatio开发者_运维百科n in c++ starting several threads in its main()
and then sleeping in main()
forever.
One of the threads is doing something that causes a segfault and a coredump is generated (ulimit -c unlimited was set previously).
I'm opening the core with gdb
and see with thread apply all bt
or info threads
that I have only one thread (started in main()
), which is impossible because at least the main()
thread should be running as well.
The question is how is it possible the rest of the threads to be missing and what could cause it?
The backtrace of this lonely thread seems ok, no strange stuff in it.
The OS is Red Hat Enterprise 5.3, gdb-6.8.
The reason why you see only one thread is that GDB is not able to distinguish threads "by itself", it relies on an external library, the libthread_db, provided by the thread library.
This library must be enabled at beginning of the debugging session in order to monitor the thread activities (birth, death, ...) and communicates all the thread-related information to GDB during runtime.
You should be able to read
[Thread debugging using libthread_db enabled]
when you try to debug any file compiled with -lpthread
, but GDB doesn't even try to enable libthread_db
when you debug a core dump.
It turned out to be kernel bug in default Red Hat Enterprise 5.3, fixed in later Red Hat version (5.4) - kernel-2.6.18-164.el5
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html-single/5.4_Technical_Notes/index.html
1.110.1. RHSA-2009:1193: Important security and bug fix update on 32-bit systems, core dumps for some multithreaded applications did not include all thread information. (BZ#505322)
https://bugzilla.redhat.com/show_bug.cgi?id=505322
Are you really sure that it's impossible? Maybe the problem is exactly related to the fact that the main thread exited without waiting for the other threads.
Try to run your application under Valgrind. Maby this will help to figure out the cause of crash.
If you don't have sighandler for SIGEGV with alt stack, which is a special case, just use strace.
strace -f myprogram
(man strace)
(we need -f flag because thread are always global scope in Linux. e.g. ~procs that run in same memory)
Here is sample output that shows a thread that exited before crash. I highlighted the interesting part...
clone(
Process 28757 attached
child_stack=0x7fc1fc319ff0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fc1fc31a9e0, tls=0x7fc1fc31a710, child_tidptr=0x7fc1fc31a9e0) = 28757
[pid 28756] rt_sigprocmask(SIG_BLOCK, [CHLD],
[pid 28757] set_robust_list(0x7fc1fc31a9f0, 0x18
[pid 28756] <... rt_sigprocmask resumed> [], 8) = 0
[pid 28757] <... set_robust_list resumed> ) = 0
[pid 28756] rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
[pid 28757] madvise(0x7fc1fb91a000, 10465280, MADV_DONTNEED
[pid 28756] rt_sigprocmask(SIG_SETMASK, [],
[pid 28757] <... madvise resumed> ) = 0
[pid 28756] <... rt_sigprocmask resumed> NULL, 8) = 0
[pid 28757] _exit(0) = ?
Process 28757 detached
nanosleep({1, 0}, 0x7fffce29c4b0) = 0
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
tgkill(28756, 28756, SIGABRT) = 0 --- SIGABRT (Aborted) @ 0 (0) ---
+++ killed by SIGABRT (core dumped) +++
Aborted (core dumped)
Now do a grep in the output and see that the numbers of attached vs detached. If you do actually have live threads at exit(crash) I would create a bugzilla entry (first search bugzilla ofc).
精彩评论