Debugging a crash when a library is opened via dlopen on OSX
I have a problem with a C++ application I've developed which uses dlopen to load user-developed libraries. The application has been used by a variety of people on a variety of linux distros and versions of OSX over the last couple of years and so I'm assuming my usage of dlopen is OK and so is the code that depends on it (yeah, this is hubris, so I'll report back when it fails). The problem I have now is that a user has developed a library which does not load on my system (OSX 10.6.4). When the system tries to load it there is a freeze then a crash. The thread that crashes looks like this in the crash report:
Thread 5 Crashed:
0 com.apple.CoreFoundation 0x00007fff80fa6110 __CFInitialize + 1808
1 dyld 0x00007fff5fc0d5ce ImageLoaderMachO::doImageInit(ImageLoader::LinkContext const&) + 138
2 dyld 0x00007fff5fc0d607 ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) + 27
3 dyld 0x00007fff5fc0bcec ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int) + 236
4 dyld 0x00007fff5fc0bc9d ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int) + 157
5 dyld 0x00007fff5fc0bc9d ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int) + 157
6 dyld 0x00007fff5fc0bc9d ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int) + 157
7 dyld 0x00007fff5fc0bc9d ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int) + 157
8 dyld 0x00007fff5fc0bc9d ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int) + 157
9 dyld 0x00007fff5fc0bda6 ImageLoader::runInitializers(ImageLoader::LinkContext const&) + 58
10 dyld 0x00007fff5fc08fbb dlopen + 573
11 libSystem.B.dylib 0x00007fff816492c0 dlopen + 61
12 cast-server-c++ 0x0000000100007819 cast::loadLibrary(std::string const&) + 96 (ComponentCreator.cpp:43)
13 cast-server-c++ 0x00000001000079c7 cast::createComponentCreator(std::string const&) + 24 (ComponentCreator.cpp:87)
14 cast-server-c++ 0x00000001000089c5 cast::CASTComponentFactory::createBase(std::string const&, std::string const&, Ice::Current const&) + 197 (CASTComponentFactory.cpp:27)
15 cast-server-c++ 0x00000001000090e9 cast::CASTComponentFactory::newManagedComponent(std::string const&, std::string const&, bool, Ice::Current const&) + 73 (CASTComponentFactory.cpp:62)
16 libCDL.dylib 0x00000001009ceb6c cast::interfaces::ComponentFactory::___newManagedComponent(IceInternal::Incoming&, Ice::Current const&) + 218 (CDL.cpp:14904)
17 libCDL.dylib 0x00000001009cf1d0 cast::interfaces::ComponentFactory::__dispatch(IceInternal::Incoming&, Ice::Current const&) + 572 (CDL.cpp:15057)
18 libIce.3.3.1.dylib 0x00000001000c9078 IceInternal::Incoming::invoke(IceInternal::Handle<IceInternal::ServantManager> const&) + 2004 (Incoming.cpp:484)
19 libIce.3.3.1.dylib 0x0000000100091a5d Ice::ConnectionI::invokeAll(IceInternal::BasicStream&, int, int, unsigned char, IceInternal::Handle<IceInternal::ServantManager> const&, IceInternal::Handle<Ice::ObjectAdapter> const&) + 367 (ConnectionI.cpp:2436)
20 libIce.3.3.1.dylib 0x000000010009bb40 Ice::ConnectionI::message(IceInternal::BasicStream&, IceInternal::Handle<IceInternal::ThreadPool> const&) + 416 (ConnectionI.cpp:1105)
21 libIce.3.3.1.dylib 0x00000001001a9bbc IceInternal::ThreadPool::run() + 3470 (ThreadPool.cpp:523)
22 libIce.3.3.1.dylib 0x00000001001aa4ec IceInternal::ThreadPool::EventHandlerThread::run() + 152 (ThreadPool.cpp:782)
23 libIceUtil.3.3.1.dylib 0x00000001006eb1e9 startHook + 128 (Thread.cpp:375)
24 libSystem.B.dylib 0x00007fff8167c456 _pthread_start + 331
25 libSystem.B.dylib 0x00007fff8167c309 thread_start + 13
(I can post the full log if needed, but it exceeds the body text limit if I include it in my post)
In the terminal where I'm running the executable the crash produces no output except for the notification that script running the executable has trapped a signal.
My question is how do I get more information on what might be causing this crash? I'm also happy if someone can suggest possible solutions, but to start with I'd at least like to know how to generate more information when the system crashes about what is actually wrong.
If I run otool on the library which is being initially being opened by dlopen everything looks fine (no missing links, symbols etc). My main suspicion is that it is the particular combination of libraries which the library being loaded is linked against which is causing this crash somehow. These other libraries can be loaded which use different subsets of these linked-against libraries. For the record the libraries include X11, ZeroC's Ice, Player/Stage and OpenCV (with the latter 2 compiled manually with dependencies installed using MacPorts). It seems it's the inclusion of OpenCV which causes the problem, as other libraries which link to all of these except OpenCV can be loaded with no problems. These are my suspicions, but I currently lack the know-how to investigate further.
Thanks! Nick
UPDATE: Thanks to Kaelin's answer (the DYLD_PRINT_* options which I wasn't previously aware of) I was able to at least confirm that nothing completely obvious was happening. Using the debug information I was able to narrow the problem down to one particular library which was causing the crash. It turned out that this library (libdc1394 linked into my app via libhighgui in OpenCV) wasn't correctly linked against CoreServices and this was causing the crash. For some reason the error was then hidden by other things, causing the ultimate crash. For info on the libdc1394 problem, look here. Unfortunately I couldn't make a clean fix that I can report here, so just managed to get a version of the app running that didn't link to th开发者_运维问答e dodgy library (by turning off libdc1394 in OpenCV compilation).
After some further problems and some further Googling I ultimately found the real cause of my problem.
One cannot call dlopen a library linked with CoreFoundation in a (sub) thread if CoreFoundation wasn't initialized in the first place. CFInitialize is called, apparently checks if the thread is the main thread and if it is not, crashes with a SIGTRAP.
http://openradar.appspot.com/7209349
dyld is running the initializers in the shared library (think static initializers in C++), and one of them is causing CoreFoundation framework's __CFInitialize function to be run. [Is it possible this is the first thing touching CoreFoundation?] And for whatever reason, __CFInitialize is not happy. This could be some kind of missing dependency. Or it could be the heap is corrupted. Or it could be a latent bug of some kind in CoreFoundation framework.
I would suggest trimming the first two possibilities by a) running with all of the DYLD_PRINT_* environment variables set [see man dyld
] and b) running under MallocDebug. If neither of those shed any light, you're probably left with writing a radar for the CoreFoundation folks to look at.
精彩评论