What happens when you overwrite a memory-mapped executable?
Following the comments on one of my questions I'm intrigued to know what happens when one overwrites an executable. I need to check my understanding on the matter.
Say I have /usr/bin/myprog
. I run it and so the OS loads /usr/bin/myprog
, probably via http://en.wikipedia.org/wiki/Memory-mapped_file#Common_uses.
For whatever reason that process remains in memory and I decide actually I've fixed a bug and I overwrite /usr/bin/myprog
.
So, as far as I understand it:
- If an instance of
myprog
is already loaded and I replace the file from whichmyprog
was already loaded, the instance ofmyprog
is unmodified. - If I run a new instance of
myprog
it will use the new code.
Am I correct?
However, according to the article on memory-mapped files, such a technique allows a developer to treat portions of a file as if they are physical memory.
So I see a contradiction in how I understood things. If pages are truly only loaded in on demand, then assuming myprog
is not 100% paged, this wikipedia article implies new pages will load from the file on disk, which has changed since the original image was loaded.
However, I am pretty certain that my two compiled images would not be the same and that the relevant address offsets for each file are not identical. So, assuming this happens, the instruction pointer is going to get very lost... I am pretty certain an operating system does not load parts of two different images into memory as part of the same process.
So how does the combination of memory-mapping/demand-paging work for the execution of programs, please? Would overwriting that file trigger a page fault on each of the executables' pages to ensure it is loaded in for the currently running process?
I did a quick experiment with this:
#include <stdio.h>
#inclu开发者_Python百科de <unistd.h>
int main(int argc, char** argv)
{
printf("Program resident...");
while(1)
{
printf("??? Just notifying you I'm still here...\n");
usleep(1000000);
}
return 0;
}
And sure enough I could a) replace this executable whilst it was running and b) its output isn't changed.
SO what is going on? I'd particularly appreciate any suggestions for stuff I can do to see what happens (Linux or Windows).
Thanks all.
Edit: question to which I was referring that sparked this question: Upgrades without reboot - what kinds of problems happen in practice?
Also, I'm aware this doesn't specifically relate to programming, but the outcome of updating an executable. I am still interested, however, and I can't think of a better place to ask it.
Under Linux, if you replace an executable while it is running, the results are unpredictable and it may crash. Pages which have been modified (e.g. "bss" initialised data) won't be affected, but pages which haven't been modified (e.g. most code) will.
My guess is that in your case, the string was in a part which was a modified (copied) page so wasn't affected.
However, all that only happens if you actually overwrite the same file.
Most of the time, when you replace an executable, you'll be replacing the directory entry with a different file. This is typically done by renaming a temporary file (in the same directory) over the existing one. This is what (for example) package managers do.
In the replacing-directory-entry case, the previous executable file continues to exist as a totally separate (still executing) file, and the previous executable can have its pages discarded and reloaded without a problem - it still sees the old file.
Quite what the linker does with its output, I don't know. But /usr/bin/install creates a new file. I expect this behaviour is quite deliberate.
What happens depends first of all on whether or not you
rm /usr/bin/myprog
and then create a new one, or whether youopen()
andwrite()
to the existing/usr/bin/myprog
.If you
rm
the old/usr/bin/myprog
file and then create a new one with the same name, then the kernel/file system driver gives the new version a new inode, and the old inode stays lying around in the/proc
filesystem until the process that has it opened closes it. Your existing process/usr/bin/myprog
has it's own private version of the file, unmodified, until itclose()
s the file descriptor.All operating systems (Windows, Linux, probably OS X) use demand-paged memory mapping (
mmap()
for posix, I can't remember the equivalent for Windows -VirtualAlloc()
?) for process loading.This way any sections of the executable that don't get touched are never loaded into memory.If this were a conventional
mmap()
'd file, and two processes both opened/mapped it, and neither of them specifiedMAP_PRIVATE
(i.e. copy-on-write) in the call tommap()
, then the two processes will essentially be looking at the same physical memory page and, provided both of them calledmmap()
withPROT_READ
|PROT_WRITE
, they'll see each other's modifications.If this were a conventionalmmap()
'd file, and process 1 had opened/mapped it, and then process 2 started to fiddle around with the file on the hard drive itself throughwrite()
calls (not mymmap
ing), process 1 does indeed see these changes. I guess the kernel notices that the file is being modified and reloads the affected pages.I don't know exactly whether there's any specialmmap()
behaviour for executable images? If I hacked a pointer to one of my functions and modified the code, would it mark the page as dirty? Would the dirty page get written back to/usr/bin/myprog
? When I try this, it segfaults, so I guess that while _TEXT pages are mapped in withMAP_SHARED
, they also probably don't getPROT_WRITE
and therefore segfault when written to. _DATA sections get loaded into memory as well, of course, and those need to be modified, but those can be markedMAP_PRIVATE
(copy-on-write) - so they probably wouldn't keep their connection to the/usr/bin/myprog
file.Point 6 concerns the executable modifying itself directly. Point 5 concerned modifying an arbitrarymmap()
d file at thewrite()
level. When I try to modify an executable (which ismmap()
'd) in another process withwrite()
, I don't get the same results as in point 5. I can make all sorts of horrible changes to the executable with barewrite()
calls, and nothing happens. Then, when I exit the process and try to run it again, it crashes (of course - after everything I've been doing to the executable file). This confuses me. I can't permute the parameters tommap()
to get it to behave this way - not copy-on-write but not affected by changes to the mapped file.Well, I went back to the Bible (Stevens) and the big issue is
MAP_PRIVATE
vsMAP_SHARED
.MAP_PRIVATE
is copy-on-write andMAP_SHARED
isn't.MAP_PRIVATE
will make a copy of the mapped page as soon as you make modifications to it. It's undefined whether a modification to the original file will propagate to the mappedMAP_PRIVATE
pages, but with OS X they don't.MAP_SHARED
maintains the connection to the original file, allowing updates to the file to propagate to the memory pages and vice-versa. If a memory block is mappedMAP_PRIVATE
, no modification that you make to it will ever be written to disk.MAP_SHARED
OTOH allows modifications to the file through writes to the mapped pages.The image loader maps executable files as
MAP_PRIVATE
. This explains the behaviour in point 6 - hacking a pointer to a function's code and then modifying it, even if you had permission to do that, wouldn't write the data back to the disk. Theoretically it should be possible to change the executable/usr/bin/myprog
immediately after the OS image loadermmap()
's it, but whenever I look at very large executables withvmmap
, their TEXT section always seems to be completely resident. I don't know whether this is because OS X's image loader touches all of the pages to ensure that they get copied up, or whether OS X's page manager is just very aggressive about making pages resident (it is), but I haven't been able to make an executable in OS X whose TEXT section wasn't completely resident as soon asmain()
started.The OS X image loader is very aggressive about loading mapped pages. I've noticed that when
mmap()
'ing a file, it has to be very large before OS X decides to leave any of it non-resident. a 1GB file gets completely loaded, but only about 1.7GB of a 3GB file gets made resident. This is on a machine with 8GB physical RAM and running the 64-bit OS X kernel.
I found this link a much more succinct explanation. Look at the part under "Update" where the author updated his original post.
http://it.toolbox.com/blogs/locutus/why-linux-can-be-updated-without-rebooting-12826
The entire file is NOT loaded into memory, they are read into buffers in clusters (which is a technical term, it's typically 4k, but you can set it when setting up your filesystems).
When you open a file, the kernel follows the link, and assigns the inode a file descriptor (a number that it keeps track of internally). When you delete the file, you are "unlinking" the inode; the file descriptor still points to it. You can create a new file with the exact same name as the old file after deleting it, effectively "replacing" it, but it will point to a different inode. Any programs that still have the old file open can still access the old file via the file descriptor, but you have effectively upgraded the program in place. As soon as the program terminates (or closes the file), and starts up (or tries to access it again), it accesses the new file, and there you have it, a completely in-place replacement of a file!
So, in Linux, your executable may only be read by page on demand, as you said, but it is read via the original open filehandle and not the updated inode of the new file that replaced your running executable program.
On Mac OS X, when I am upgrading any application that is running, the upgrade program asks for the application be shut down completely for the upgrade to proceed, in some cases, it proceeds but upgrades don't take effect until application restart. It would sometimes list out the names of libraries being in use which are causing the upgrade to block. I say this because, the application seems to lock the libraries on which it depends and the upgrade program seems to know to not touch them if they are in use. Thus, not over writing any part or whole of a running application.
Google Chrome, for example, asks for a restart for upgrades to be installed and take effect on Windows, Linux and Mac OS X. I hope that gives you a clue as to where to start looking.
You are generally correct: binary file stores only an image of the executable. What is actually stored in the memory at any specific moment is a copy.
But, these images usually have several sections - or segments. One of them stores the actual machine instructions, the compiled code of your program - and it is loaded always completely. Other sections may contain static data - various constants (especially strings), which you probably used throughout your code. These sections may be loaded by the OS on-demand.
精彩评论