Replacing shared object (.so file) while main program is running
I have a shared object gateway.so (in Linux/C). And a.out application is using it.
QUESTION A
I guess: w开发者_运维百科hen process a.out starts, the loader loads the gateway.so (I am not using dl functions like dlopen
). So all the runtime symbol resolutions to gateway.so will happen in memory. It need not to access gateway.so from disk any more.
Am I right?
So I cannot replace the gateway.so with an updated version, while a.out is running, right?
QUESTION B
Another related question: Once when I substitued and outdated version of the gateway.so file, i got the message
"a.out: can't resolve symbol 'Test_OpenGateway'"
Which program component (loader/linker ...) sends this output ? This component is executing as part of the same process context ?
Question A
You can replace the library while an application is using it, if you do it the right way.
Before we get there lets have a look at the main program binary. Here is an example program:
#include <unistd.h>
void justsit(void) {
for (;;) {
sleep(1);
}
}
int main(int argc, char **argv) {
printf("My PID is %d\n", getpid());
justsit();
return 0;
}
Compile and start it:
$ gcc -Wall -o example example.c
$ ./example
My PID is 4339
Now it will just sit there, so open a new terminal to do this:
$ gcc -Wall -o example-updated example.c
$ cp example-updated example
cp: cannot create regular file `example': Text file busy
What happened now? The kernel refused changing file example because it has a process that is running that file.
Now lets try to remove it:
$ rm example
What? That worked? Why can the file be removed, but not replaced? Yes, or rather, the file was not really removed, just the "name", the kernel tells the filesystem to keep the contents of the file. When nothing has the file open any longer the contents are also removed. (dentry is removed immediately and but inode is freed when it has no users as filesystem people would say)
This can sort of be seen in /proc: (this is why the program prints its PID so you can easily check this)
$ readlink /proc/4339/exe
/tmp/t/example (deleted)
Anyhow. The fact that it works like this means that one can safely upgrade a program by removing the old binary and putting the new one in same place. There is a program to handle this: install(1).
Ok, back to your question - shared objects.
Let's split the example into two parts, main.c and shared.c:
/* main.c */
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
void justsit(void);
int main(int argc, char **argv) {
printf("My PID is %d\n", getpid());
justsit();
return 0;
}
and
/* shared.c */
#include <stdio.h>
#include <unistd.h>
void justsit(void) {
for (;;) {
sleep(1);
}
}
Compile them like this:
$ gcc -Wall --shared -o libshared.so shared.c
$ gcc -Wall -L. -o main main.c -lshared
Now hopefully if we try to replace libshared.so we would get a similar "Text file busy" error? Lets see. First start the main program - current directory is not in lib search path so tell dynamic linker to search there:
$ LD_LIBRARY_PATH=. ./main
My PID is 5697
Go to a different terminal and replace the library with something obviously broken:
$ echo "junk" > libshared.so
$
First - it wasn't refused like replacing the program binary. And in the other terminal something interesting happened, the program stopped running with the following error message:
Segmentation fault
$
So it is NOT forbidden to replace a library in use by a program! But as seen from the example above it can have disastrous consequences.
Luckily the same "trick" that was used to replace a running binary can be used to replace a lib in use. Restart the main program (don't forget to recompile libshared.so too as that was replaced by junk) and see how it is safe to do rm on the library. /proc/PID/maps can be inspected to see what shared objects the process is using:
$ cat /proc/5733/maps | grep libshared.so
008a8000-008a9000 r-xp 00000000 08:01 2097292 /tmp/t/libshared.so
008a9000-008aa000 r--p 00000000 08:01 2097292 /tmp/t/libshared.so
008aa000-008ab000 rw-p 00001000 08:01 2097292 /tmp/t/libshared.so
$ rm libshared.so
$ cat /proc/5733/maps | grep libshared.so
008a8000-008a9000 r-xp 00000000 08:01 2097292 /tmp/t/libshared.so (deleted)
008a9000-008aa000 r--p 00000000 08:01 2097292 /tmp/t/libshared.so (deleted)
008aa000-008ab000 rw-p 00001000 08:01 2097292 /tmp/t/libshared.so (deleted)
The main program continues to run fine. Again this is because just the name (dentry) was removed from disk, not the actual contents (inode). After the removal it is safe to create a new file with the name libshared.so without affecting the running program.
So, to summarize - just use the install command to install programs and binaries.
Question B
Yes, that is printed by the dynamic linker, in userspace.
#include <stdio.h>
#include <unistd.h>
int main(int argc, char **argv) {
execl("./main", "main", NULL);
printf("exec failed?\n");
return 0;
}
Compile it with gcc -Wall -o execit execit.c
. Remember that execl
replaces the current process with the specified command.
$ ./execit
main: error while loading shared libraries: libshared.so: cannot open shared object file: No such file or directory
$ rm main
$ ./execit
exec failed?
What happened and what does it tell us? First there is error while loading shared libraries
without exec failed?
. No "exec failed" suggests that the process was successfully replaced. This means that the kernel transfered control to the dynamic linker which failed. After "main" was removed it fails early and the process is not replaced.
No, the file may still need to be read from disk once the runtime linker (ld.so
) has mapped it into the process' address space. The way this mapping happens is through the mmap(2)
system call and the flag PROT_EXEC
to allow execution.
The map doesn't put the entire file into memory once it is mapped, but actually creates a memory region which would invoke a page fault on-demand if the requested piece of memory isn't already copied, and that page fault is handled in kernel-space by reading at the appropriate offset in the file.
Regarding the second question, it is the runtime linker (ld.so
) that complains about this. The code that loads ld.so
is emited as program startup code by the compile-time linker (ld
), so it is executed in user-space, before main
is called.
To A: Yes indeed, once the shared lib is mapped to memory you cannot replace it anymore. It might be even that the system has already loaded a previous version of the lib for some other process and detects that the so is already mapped in to memory and remaps that as the part of the startup process. Thats why you always have to restart (even *nixes) after critical updates ;)
To B: The symbols the executable uses is recorded in the symbol table within the binary. The system loader scans this table and tries to resolve the addresses of the required function. If it cannot find it, you get this error. So the answer is, the message is produced by the dynamic link loader.
a. Right. In this case, you must work with dl_*()
stuff and close the file as soon as possible.
b. If you substitute the said file, and it does not contain a required symbol, loading fails and you get the said error.
精彩评论