开发者

Help with understanding a very basic main() disassembly in GDB

Heyo,

I have written this very basic main function to experiment with disassembly and also to see and hopefully understand what is going on at the lower level:

int main() {
  return 6;
}

Using gdb to disas main produces this:

0x08048374 <main+0>:    lea    0x4(%esp),%ecx
0x08048378 <main+4>:    and    $0xfffffff0,%esp
0x0804837b <main+7>:    pushl  -0x4(%ecx)
0x0804837e <main+10>:   push   %ebp
0x0804837f <main+11>:   mov    %esp,%ebp
0x08048381 <main+13>:   push   %ecx
0x08048382 <main+14>:   mov    $0x6,%eax
0x08048387 <main+19>:   pop    %ecx
0x08048388 <main+20>:   pop    %ebp
0x08048389 <main+21>:   lea    -0x4(%ecx),%esp
0x0804838c <main+24>:   ret  

Here is my best guess as to what I think is going on and what I need help w开发者_JAVA技巧ith line-by-line:

lea 0x4(%esp),%ecx

Load the address of esp + 4 into ecx. Why do we add 4 to esp?

I read somewhere that this is the address of the command line arguments. But when I did x/d $ecx I get the value of argc. Where are the actual command line argument values stored?

and $0xfffffff0,%esp

Align stack

pushl -0x4(%ecx)

Push the address of where esp was originally onto the stack. What is the purpose of this?

push %ebp

Push the base pointer onto the stack

mov %esp,%ebp

Move the current stack pointer into the base pointer

push %ecx

Push the address of original esp + 4 on to stack. Why?

mov $0x6,%eax

I wanted to return 6 here so i'm guessing the return value is stored in eax?

pop %ecx

Restore ecx to value that is on the stack. Why would we want ecx to be esp + 4 when we return?

pop %ebp

Restore ebp to value that is on the stack

lea -0x4(%ecx),%esp

Restore esp to it's original value

ret

I am a n00b when it comes to assembly so any help would be great! Also if you see any false statements about what I think is going on please correct me.

Thanks a bunch! :]


Stack frames

The code at the beginning of the function body:

push  %ebp
mov   %esp, %ebp

is to create the so-called stack frame, which is a "solid ground" for referencing parameters and objects local to the procedure. The %ebp register is used (as its name indicates) as a base pointer, which points to the base (or bottom) of the local stack inside the procedure.

After entering the procedure, the stack pointer register (%esp) points to the return address stored on the stack by the call instruction (it is the address of the instruction just after the call). If you'd just invoke ret now, this address would be popped from the stack into the %eip (instruction pointer) and the code would execute further from that address (of the next instruction after the call). But we don't return yet, do we? ;-)

You then push %ebp register to save its previous value somewhere and not lose it, because you'll use it for something shortly. (BTW, it usually contains the base pointer of the caller function, and when you peek that value, you'll find a previously stored %ebp, which would be again a base pointer of the function one level higher, so you can trace the call stack that way.) When you save the %ebp, you can then store the current %esp (stack pointer) there, so that %ebp will point to the same address: the base of the current local stack. The %esp will move back and forth inside the procedure when you'll be pushing and popping values on the stack or reserving & freeing local variables. But %ebp will stay fixed, still pointing to the base of the local stack frame.

Accessing parameters

Parameters passed to the procedure by the caller are "burried just uner the ground" (that is, they have positive offsets relative to the base, because stack grows down). You have in %ebp the address of the base of the local stack, where lies the previous value of the %ebp. Below it (that is, at 4(%ebp) lies the return address. So the first parameter will be at 8(%ebp), the second at 12(%ebp) and so on.

Local variables

And local variables could be allocated on the stack above the base (that is, they'd have negative offsets relative to the base). Just subtract N to the %esp and you've just allocated N bytes on the stack for local variables, by moving the top of the stack above (or, precisely, below) this region :-) You can refer to this area by negative offsets relative to %ebp, i.e. -4(%ebp) is the first word, -8(%ebp) is second etc. Remember that (%ebp) points to the base of the local stack, where the previous %ebp value has been saved. So remember to restore the stack to the previous position before you try to restore the %ebp through pop %ebp at the end of the procedure. You can do it two ways:
1. You can free only the local variables by adding back the N to the %esp (stack pointer), that is, moving the top of the stack as if these local variables had never been there. (Well, their values will stay on the stack, but they'll be considered "freed" and could be overwritten by subsequent pushes, so it's no longer safe to refer them. They're dead bodies ;-J )
2. You can flush the stack down to the ground and free all local space by simply restoring the %esp from the %ebp which has been fixed earlier to the base of the stack. It'll restore the stack pointer to the state it has just after entering the procedure and saving the %esp into %ebp. It's like loading the previously saved game when you've messed something ;-)

Turning off frame pointers

It's possible to have a less messy assembly from gcc -S by adding a switch -fomit-frame-pointer. It tells GCC to not assemble any code for setting/resetting the stack frame until it's really needed for something. Just remember that it can confuse debuggers, because they usually depend on the stack frame being there to be able to track up the call stack. But it won't break anything if you don't need to debug this binary. It's perfectly fine for release targets and it saves some spacetime.

Call Frame Information

Sometimes you can meet some strange assembler directives starting from .cfi interleaved with the function header. This is a so-called Call Frame Information. It's used by debuggers to track the function calls. But it's also used for exception handling in high-level languages, which needs stack unwinding and other call-stack-based manipulations. You can turn it off too in your assembly, by adding a switch -fno-dwarf2-cfi-asm. This tells the GCC to use plain old labels instead of those strange .cfi directives, and it adds a special data structures at the end of your assembly, refering to those labels. This doesn't turn off the CFI, just changes the format to more "transparent" one: the CFI tables are then visible to the programmer.


You did pretty good with your interpretation. When a function is called, the return address is automatically pushed to the stack, which is why argc, the first argument, has been pushed back to 4(%esp). argv would start at 8(%esp), with a pointer for each argument, followed by a null pointer. This function pushes the old value of %esp to the stack so that it can contain the original, unaligned value upon returned. The value of %ecx at return doesn't matter, which is why it is used as temporary storage for the %esp reference. Other than that, you are correct with everything.


Regarding your first question (where are stored the command line arguments), arguments to functions are right before ebp. I must say, your "real" main begins at < main + 10 >, where it pushes ebp and moves esp to ebp. I think that gcc messes everything up with all that leas just to replace the usual operations (addictions and subtractions) on esp before and after functions call. Usually a routine looks like this (simple function I did as an example):

   0x080483b4 <+0>:     push   %ebp     
   0x080483b5 <+1>:     mov    %esp,%ebp
   0x080483b7 <+3>:     sub    $0x10,%esp            # room for local variables
   0x080483ba <+6>:     mov    0xc(%ebp),%eax        # get arg2
   0x080483bd <+9>:     mov    0x8(%ebp),%edx        # and arg1
   0x080483c0 <+12>:    lea    (%edx,%eax,1),%eax    # just add them
   0x080483c3 <+15>:    mov    %eax,-0x4(%ebp)       # store in local var
   0x080483c6 <+18>:    mov    -0x4(%ebp),%eax       # and return the sum
   0x080483c9 <+21>:    leave
   0x080483ca <+22>:    ret 

Perhaps you've enabled some optimizations, which could make the code trickier. Finally yes, the return value is stored in eax. Your interpretation is quite correct anyway.


The only thing I think that's outstanding from your original questions is why the following statements exist in your code:

0x08048381 <main+13>:   push   %ecx
0x08048382 <main+14>:   mov    $0x6,%eax
0x08048387 <main+19>:   pop    %ecx

The push and pop of %ecx at <main+13> and <main+19> don't seem to make much sense - and they don't really do anything in this example, but consider the case where your code invokes function calls.

There's no way for the system to guarantee that the calls to other functions - which will set up their own stack activation frames - won't reset register values. In fact they probably will. The code therefore sets up a saved register section on the stack where any registers used by the code (other than %esp and %ebp which are already saved though the regular stack setup) are stored in the stack before possibly handing control over to function calls in the "meat" of the current code block.

When these potential calls return, the system then pops the values off the stack to restore the pre-call register values. If you were writing assembler directly rather than compiling, you'd be responsible for storing and retrieving these register values, yourself.

In the case of your example code, however, there are no function calls - only a single instruction at <main+14> where you're setting the return value, but the compiler can't know that, and preserves its registers as usual.


It would be interesting to see what would happen here if you added C statements which pushed other values onto the stack after <main+14>. If I'm right about this being a saved register section of the stack, you'd expect the compiler to insert automatic pop statements prior to <main+19> in order to clear these values.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜