开发者

What caused the mysterious duplicate entry in my stack?

I'm investigating a deadlock bug. I took a core with gcore, and found that one of my functions seems to have called itself - even though it does not make a recursive function call.

Here's a fragment of the stack from gdb:

Thread 18 (Thread 4035926944 (LWP 23449)):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x005133de in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#2  0x00510017 in _L_mutex_lock_182 () from /lib/tls/libpthread.so.0
#3  0x080d653c in ?? ()
#4  0xf7c59480 in ?? () from LIBFOO.so
#5  0x081944c0 in ?? ()
#6  0x081944b0 in ?? ()
#7  0xf08f3b38 in ?? ()
#8  0xf7c3b34c in FOO::Service::releaseObject ()
   from LIBFOO.so
#9  0xf7c3b34c in FOO::Service::releaseObject ()
   from LIBFOO.so
#10 0xf7c36006 in FOO::RequesterImpl::releaseObject ()
   from LIBFOO.so
#11 0xf7e开发者_如何学Go2afbf in BAR::BAZ::unsubscribe (this=0x80d0070, sSymbol=@0xf6ded018)
    at /usr/lib/gcc/x86_64-redhat-linux/3.4.6/../../../../include/c++/3.4.6/bits/stl_tree.h:176
...more stack

I've elided some of the names: FOO & BAR are namespaces.BAZ is a class.

The interesting part is #8 and #9, the call to Service::releaseObject(). This function does not call itself, nor does it call any function that calls it back... it is not recursive. Why then does it appear in the stack twice?

Is this an artefact created by the debugger, or could it be real?

You'll notice that the innermost call is waiting for a mutex - I think this could be my deadlock. Service::releaseObject() locks a mutex, so if it magically teleported back inside itself, then a deadlock most certainly could occur.

Some background:

This is compiled using g++ v3.4.6 on RHEL4. It's a 64-bit OS, but this is 32-bit code, compiled with -m32. It's optimised at -O3. I can't guarantee that the application code was compiled with exactly the same options as the LIBFOO code.

Class Service has no virtual functions, so there's no vtable. Class RequesterImpl inherits from a fully-virtual interface, so it does have a vtable.


Stacktraces are unreliable on x86 at any optimization level: -O1 and higher enable -fomit-frame-pointer.


The reason you get "bad" stack is that __lll_mutex_lock_wait has incorrect unwind descriptor (it is written in hand-coded assembly). I believe this was fixed somewhat recently (in 2008), but can't find the exact patch.

Once the GDB stack unwinder goes "off balance", it creates bogus frames (#2 through #8), but eventually stumbles on a frame which uses frame pointer, and produces correct stack trace for the rest of the stack.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜