开发者

Help deciphering simple Assembly Code

I am learning assembly using GDB & Eclipse

Here is a simple C code.

int absdiff(int x, int y)
{  
if(x < y)  
  return y-x;
else  
  return x-y;
}

int main(void) {  
int x = 10;  
int y = 15;  
absdiff(x,y);
return EXIT_SUCCESS;  
}

Here is corresponding assembly instructions for main()

          main:
080483bb:   push %ebp  #push old frame pointer onto the stack
080483bc:   mov %esp,%ebp #move the frame pointer down, to the position of stack pointer
080483be:   sub $0x18,%esp  # ???
25         int x = 10;    
080开发者_开发百科483c1:   movl $0xa,-0x4(%ebp) #move the "x(10)" to 4 address below frame pointer (why not push?)
26         int y = 15;
080483c8:   movl $0xf,-0x8(%ebp) #move the "y(15)" to 8 address below frame pointer (why not push?)
28         absdiff(x,y);
080483cf:   mov -0x8(%ebp),%eax # -0x8(%ebp) == 15 = y, and move it into %eax
080483d2:   mov %eax,0x4(%esp) # from this point on, I am confused
080483d6:   mov -0x4(%ebp),%eax 
080483d9:   mov %eax,(%esp)
080483dc:   call 0x8048394 <absdiff>
31         return EXIT_SUCCESS;
080483e1:   mov $0x0,%eax
32        }

Basically, I am asking to help me to make sense of this assembly code, and why it is doing things in this particular order. Point where I am stuck, is shown in assembly comments. Thanks !


Lines 0x080483cf to 0x080483d9 are copying x and y from the current frame on the stack, and pushing them back onto the stack as arguments for absdiff() (this is typical; see e.g. http://en.wikipedia.org/wiki/X86_calling_conventions#cdecl). If you look at the disassembler for absdiff() (starting at 0x8048394), I bet you'll see it pick these values up from the stack and use them.

This might seem like a waste of cycles in this instance, but that's probably because you've compiled without optimisation, so the compiler does literally what you asked for. If you use e.g. -O2, you'll probably see most of this code disappear.


First it bears saying that this assembly is in the AT&T syntax version of x86_32, and that the order of arguments to operations is reversed from the Intel syntax (used with MASM, YASM, and many other assemblers and debuggers).

080483bb:   push %ebp  #push old frame pointer onto the stack
080483bc:   mov %esp,%ebp #move the frame pointer down, to the position of stack pointer
080483be:   sub $0x18,%esp  # ???

This enters a stack frame. A frame is an area of memory between the stack pointer (esp) and the base pointer (ebp). This area is intended to be used for local variables that have to live on the stack. NOTE: Stack frames don't have to be implemented in this way, and GCC has the optimization switch -fomit-frame-pointer that does away with it except when alloca or variable sized arrays are used, because they are implemented by changing the stack pointer by arbitrary values. Not using ebp as the frame pointer allows it to be used as an extra general purpose register (more general purpose registers is usually good).

Using the base pointer makes several things simpler to calculate for compilers and debuggers, since where variables are located relative to the base does not change while in the function, but you can also index them relative to the stack pointer and get the same results, though the stack pointer does tend to change around so the same location may require a different index at different times.

In this code 0x18 (or 24) bytes are being reserved on the stack for local use.

This code so far is often called the function prologue (not to be confused with the programming language "prolog").

25         int x = 10;    
080483c1:   movl $0xa,-0x4(%ebp) #move the "x(10)" to 4 address below frame pointer (why not push?)

This line moves the constant 10 (0xA) to a location within the current stack frame relative to the base pointer. Because the base pointer below the top of the stack and since the stack grows downward in RAM the index is negative rather than positive. If this were indexed relative to the stack pointer a different index would be used, but it would be positive.

You are correct that this value could have been pushed rather than copied like this. I suspect that this is done this way because you have not compiled with optimizations turned on. By default gcc (which I assume you are using based on your use of gdb) does not optimize much, and so this code is probably the default "copy a constant to a location in the stack frame" code. This may not be the case, but it is one possible explanation.

26         int y = 15;
080483c8:   movl $0xf,-0x8(%ebp) #move the "y(15)" to 8 address below frame pointer (why not push?)

Similar to the previous line of code. These two lines of code put the 10 and 15 into local variables. They are on the stack (rather than in registers) because this is unoptimized code.

28         absdiff(x,y);

gdb printing this meant that this is the source code line being executed, not that this function is being executed (yet).

080483cf:   mov -0x8(%ebp),%eax # -0x8(%ebp) == 15 = y, and move it into %eax

In preparation for calling the function the values that are being passed as arguments need to be retrieved from their storage locations (even though they were just placed at those locations and their values are known because of the no optimization thing)

080483d2:   mov %eax,0x4(%esp) # from this point on, I am confused

This is the second part of the move to the stack of one of the local variables' value so that it can be use as an argument to the function. You can't (usually) move from one memory address to another on x86, so you have to move it through a register (eax in this case).

080483d6:   mov -0x4(%ebp),%eax 
080483d9:   mov %eax,(%esp)

These two lines do the same thing except for the other variable. Note that since this variable is being moved to the top of the stack that no offset is being used in the second instruction.

080483dc:   call 0x8048394 <absdiff>

This pushed the return address to the top of the stack and jumps to the address of absdiff.

You didn't include code for absdiff, so you probably did not step through that.

31         return EXIT_SUCCESS;
080483e1:   mov $0x0,%eax

C programs return 0 upon success, so EXIT_SUCCESS was defined as 0 by someone. Integer return values are put in eax, and some code that called the main function will use that value as the argument when calling the exit function.

32        }

This is the end. The reason that gdb stopped here is that there are things that actually happen to clean up. In C++ it is common to see destructor for local class instances being called here, but in C you will probably just see the function epilogue. This is the compliment to the function prologue, and consists of returning the stack pointer and base pointer to the values that they were originally at. Sometimes this is done with similar math on them, but sometimes it is done with the leave instruction. There is also an enter instruction which can be used for the prologue, but gcc doesn't do this (I don't know why). If you had continued to view the disassembly here you would have seen the epilogue code and a ret instruction.

Something you may be interested in is the ability to tell gcc to produce assembly files. If you do:

gcc -S source_file.c

a file named source_file.s will be produced with assembly code in it.

If you do:

 gcc -S -O source_file.c

Then the same thing will happen, but some basic optimizations will be done. This will probably make reading the assembly code easier since the code will not likely have as many odd instructions that seem like they could have been done a better way (like moving constant values to the stack, then to a register, then to another location on the stack and never using the push instruction).

You regular optimization flags for gcc are:

-O0         default -- none
-O1         a few optimizations
-O          the same as -O1
-O2         a lot of optimizations
-O3         a bunch more, some of which may take a long time and/or make the code a lot bigger
-Os         optimize for size -- similar to -O2, but not quite

If you are actually trying to debug C programs then you will probably want the least optimizations possible since things will happen in the order that they are written in your code and variables won't disappear.

You should have a look at the gcc man page:

man gcc


Remember, if you're running in a debugger or debug mode, the compiler reserves the right to insert whatever debugging code it likes and make other nonsensical code changes.

For example, this is Visual Studio's debug main():

int main(void) {  
001F13D0  push        ebp  
001F13D1  mov         ebp,esp  
001F13D3  sub         esp,0D8h  
001F13D9  push        ebx  
001F13DA  push        esi  
001F13DB  push        edi  
001F13DC  lea         edi,[ebp-0D8h]  
001F13E2  mov         ecx,36h  
001F13E7  mov         eax,0CCCCCCCCh  
001F13EC  rep stos    dword ptr es:[edi]  
    int x = 10;  
001F13EE  mov         dword ptr [x],0Ah  
    int y = 15;  
001F13F5  mov         dword ptr [y],0Fh  
    absdiff(x,y);
001F13FC  mov         eax,dword ptr [y]  
001F13FF  push        eax  
001F1400  mov         ecx,dword ptr [x]  
001F1403  push        ecx  
001F1404  call        absdiff (1F10A0h)  
001F1409  add         esp,8  
    *(int*)nullptr = 5;
001F140C  mov         dword ptr ds:[0],5  
    return 0;  
001F1416  xor         eax,eax  
}
001F1418  pop         edi  
001F1419  pop         esi  
001F141A  pop         ebx  
001F141B  add         esp,0D8h  
001F1421  cmp         ebp,esp  
001F1423  call        @ILT+300(__RTC_CheckEsp) (1F1131h)  
001F1428  mov         esp,ebp  
001F142A  pop         ebp  
001F142B  ret  

It helpfully posts the C++ source next to the corresponding assembly. In this case, you can fairly clearly see that x and y are stored on the stack explicitly, and an explicit copy is pushed on, then absdiff is called. I explicitly de-referenced nullptr to cause the debugger to break in. You may wish to change compiler.


Compile with -fverbose-asm -g -save-temps for additional information with GCC.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜