dlopen works second time on bad shared library on ubuntu 11.04; does the right thing on centos 5.5
I have bad shared library (undefined symbol).
When I call dlopen() on it the first time, I get a NULL result with correct error message from dlerror().
If I ignore the error message and call dlopen() using the same arguments, I get a non-null handle the second time (which indicates that the library was successfully loaded). This is obviously wrong.
This problem occurs under Ubuntu 11.04 (IIRC, 10.10 did not have this problem). Centos 5.5 doesn't exhibit this problem.
In particular, this problem occurs within the Tcl interpreter. It will try to load a shared library, first with a canonicalized absolute path and if that fails again literally with the exact path string the user gave. In my case, both should fail, but the second call is incorrectly succeeding under Ubuntu 11.04.
Oddly enough, I am able to reproduce this problem only with my exact production shared library. 开发者_如何学JAVAIf I make a reduced shared library, it is working correctly.
A program like this is enough to show the problem with my production library:
#include <stdio.h>
#include <dlfcn.h>
int main()
{
void* h;
h = dlopen("./prod.so", RTLD_NOW | RTLD_LOCAL);
printf("h is %p\n", h);
printf("err is %s\n", dlerror());
h = dlopen("./prod.so", RTLD_NOW | RTLD_LOCAL);
printf("h is %p\n", h);
}
I've been seeing the edges of this issue occasionally for a while, but I've yet to pin down exactly what caused it (I've yet to find the right thing to Google for, but it's not something that Ubuntu feels to be a headline change so it's hard to find). Someone mentioned to me in passing on IRC what was wrong, but it was a while ago, I was up to my eyeballs in another problem at the time, and I didn't save enough information (written down or in memory) to be able to reconstruct it. So this is my best recollection…
As far as I can tell, there were some changes to either the link options used when building some libraries or to the default options used when resolving dependent libraries, and this is causing Tcl to fail to load everything it depends on. Because it fails to load some dependency — or maybe even a dependency of a dependency — it fails to load the rest of the library (because of the RTLD_NOW
flag, which you want) and you get to where you are now. It's probably easy to fix, such as by changing link-time options, but I don't know what's wrong in detail.
In short, it's someone's bug, but whose I don't know. A lot (but not all!) of Linux distributors are not very good at feeding back upstream on issues that they discover or create.
NB: if your code above is a proxy for Tcl's load
command, be aware that this is a tricky area in itself.
精彩评论