GCC 4.3/4.4 vs MSC 6 on i386 optimization for size fail
I am not sure what am I doing wrong, but I've tried reading manuals about calling conventions of GCC and found nothing useful there. My current problem is GCC generates excessively LARGE code for a very simple operation, like shown below.
main.c:
#ifdef __GNUC__
// defines for GCC
typedef void (* push1)(unsigned long);
#define PUSH1(P,A0)((push1)P)((unsigned long)A0)
#else
// defines for MSC
typedef void (__stdcall * push1)(unsigned long);
#define PUSH1(P,A0)((push1)P)((unsigned long)A0)
#endif
int main() {
// pointer to nasm-linked exit syscall "function".
// will not work for win32 target, provided as an example.
PUSH1(0x08048200,0x7F);
}
Now, let's build and dump it with gcc: gcc -c main.c -Os;objdump -d main.o
:
main.o: file format elf32-i386
Disassembly of section .text:
00000000 <.text>:
0: 8d 4c 24 04 lea 0x4(%esp),%ecx
4: 83 e4 f0 and $0xfffffff0,%esp
7: ff 71 fc pushl -0x4(%ecx)
a: b8 00 82 04 08 mov $0x8048200,%eax
f: 55 push %ebp
10: 89 e5 mov %esp,%ebp
12: 51 push %ecx
13: 83 ec 10 sub $0x10,%esp
16: 6a 7f push $0x7f
18: ff d0 call *%eax
1a: 8b 4d fc mov -0x4(%ebp),%ecx
1d: 83 c4 0c add $0xc,%esp
20: c9 leave
21: 8d 61 fc lea -0x4(%ecx),%esp
24: c3 ret
That's the minimum size code I am able to get... If I don't specify -O* or specify other values, it will be 0x29 + bytes long.
Now, let's build it with ms c compiler v 6 (yea, one of year 98 iirc): wine /mnt/ssd/msc/6/cl /c /TC main.c;wine /mnt/ssd/msc/6/dumpbin /disasm main.obj
:
Dump of file main.obj
File Type: COFF OBJECT
_main:
00000000: 55 push ebp
00000001: 8B EC mov ebp,esp
00000003: 6A 7F push 7Fh
0000开发者_Python百科0005: B8 00 82 04 08 mov eax,8048200h
0000000A: FF D0 call eax
0000000C: 5D pop ebp
0000000D: C3 ret
How do I make GCC generate the similar by size code? any hints, tips? Don't you agree resulting code should be small as that? Why does GCC append so much useless code? I thought it'd be smarter than such old stuff like msc6 when optimizing for size. What am I missing here?
main() is special here: gcc is doing some extra work to make the stack 16-byte aligned at the entry point of the program. So the size of the result aren't directly comparable... try renaming main() to f() and you'll see gcc generates drastically different code.
(The MSVC-compiled code doesn't need to care about alignment because Windows has different rules for stack alignment.)
This is the best reference I can get. I'm on Windows now and too lazy to login to my Linux to test. Here (MinGW GCC 4.5.2), the code is smaller than yours. One difference is the calling convention, stdcall of course has a few bytes advantage over cdecl (default on GCC if not specified or with -O1 and I guess with -Os, too) to clean up the stack.
Here's the way I compile and the result (source code is purely copy pasted from your post)
gcc -S test.c:
_main:
pushl %ebp #
movl %esp, %ebp #,
andl $-16, %esp #,
subl $16, %esp #,
call ___main #
movl $127, (%esp) #,
movl $134513152, %eax #, tmp59
call *%eax # tmp59
leave
ret
gcc -c -o test.o test.c && objdump -d test.o:
test.o: file format pe-i386
Disassembly of section .text:
00000000 <_main>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 e4 f0 and $0xfffffff0,%esp
6: 83 ec 10 sub $0x10,%esp
9: e8 00 00 00 00 call e <_main+0xe>
e: c7 04 24 7f 00 00 00 movl $0x7f,(%esp)
15: b8 00 82 04 08 mov $0x8048200,%eax
1a: ff d0 call *%eax
1c: c9 leave
1d: c3 ret
1e: 90 nop
1f: 90 nop
精彩评论