Discover program's image in memory
Here's something less-than-important that I've been musing about recently:
I know that my program's virtual address space contains the stack (of each thread) and a heap and some statically allocated memory and all that. But does it also contain the program's image with all the instructions? And is it possible somehow (no matter how platform dependent the trick) to find out the address range of my own image? Is the memory read-only?
In short: Can I make a program that prints itself?
If that can't be done, a lesser question would be, can I print my own stack? I was thinking something like this:
const char * BASE;
void print_stack();
int main(int argc, char * argv[]) {
BASE = &argc;
/* do stuff */
print_stack();
return 0;
}
void print_stack() {
int sentinel;
const char * bottom开发者_开发问答 = &sentinel;
while (bottom < BASE)
printf("%02X ", *bottom++);
}
To answer your first question, of course it contains your program's instructions: you can only execute what you can access. To get at the address of your instructions, you can take the address of a function and start printing from there. You can then use a library like udis86 to disassemble them. Note however, that your compiler isn't required to order the functions in any specific way, so starting at main
and reading from there isn't guaranteed to get everything, might trample on un-allocated memory.
To get at the entire instruction memory range (you're looking for the .text
segment), you can look up the address+size from your operating system (In Linux, that info will be in /proc/[pid]/maps
, in OS X you can either use vmmap
or ask the kernel via the mach_vm_region()
kernel trap), and then just read the memory directly. You can also use nm
to dump the symbols of your program, isolate all that point to the .text
segment (They should be marked with T
in nm
output) and dump those. This is not a good method, since you'd have to disassemble everything to determine where they end in the case there's padding between them.
All the mapped memory is accessible, but not all of it will be writeable (The .text
segment wouldn't be). One thing to keep in mind, the addresses will probably not be stable invocation to invocation if your operating system implements ASLR.
To address your second question, yes you can print your own stack and symbolicate it with the help of third-party libraries, but not the way you're trying to do it. Stack typically grows down (i.e. Starts at a high address and moves towards lower addresses. As an exercise to the reader, disassemble one of your functions via gdb
or another disassembler and look how memory on the stack gets allocated during your function prolog), so your for-loop will never run as BASE
will probably always be larger than the address of sentinel
.
Yes, the code bytes, usually called the program "text" in this context, is part of your virtual address space.
You can determine the address of a function, e.g. main()
, and use that to determine a single valid address in a range of text pages. You will then have to call virtual memory specific APIs to determine the extent of the mapping at that address.
Shared libraries (.so files) will have their texts mapped to discontiguous VM regions.
精彩评论