开发者

GCC/g++ cout << vs. printf()

  • Why does printf("hello world") ends 开发者_Python百科up using more CPU instructions in the assembled code (not considering the standard library used) than cout << "hello world"?

For C++ we have:

movl    $.LC0, %esi
movl    $_ZSt4cout, %edi
call    _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc

For C:

movl    $.LC0, %eax
movq    %rax, %rdi
movl    $0, %eax
call    printf
  • WHAT are line 2 from the C++ code and lines 2,3 from the C code for?

I'm using gcc version 4.5.2


For 64bit gcc -O3 (4.5.0) on Linux x86_64, this reads for: cout << "Hello World"

movl    $11, %edx         ; String length in EDX
movl    $.LC0, %esi       ; String pointer in ESI
movl    $_ZSt4cout, %edi  ; load virtual table entry of "cout" for "ostream"
call    _ZSt16__ostream_insertIcSt11char_traits...basic_ostreamIT_T0_ES6_PKS3_l

and, for printf("Hello World")

movl    $.LC0, %edi       ; String pointer to EDI
xorl    %eax, %eax        ; clear EAX (maybe flag for printf=>no stack arguments)
call    printf

which means, your sequence depends entirely on any specific compiler implementation, its version and probably compiler options. Your Edit states,you use gcc 4.5.2 (which is fairly new). Seems like 4.5.2 introduces additional 64bit register fiddling in this sequence for whatever reason. It saves the 64bit RAX to RDI before zeroing it out - which makes absolutely no sense (at least for me).

Much more interesting: 3 Argument call sequence (g++ -O1 -S source.cpp):

 void c_proc()
{
 printf("%s %s %s", "Hello", "World", "!") ;
}

 void cpp_proc()
{
 std::cout << "Hello " << "World " << "!";
}

leads to (c_proc):

movl    $.LC0, %ecx
movl    $.LC1, %edx
movl    $.LC2, %esi
movl    $.LC3, %edi
movl    $0, %eax
call    printf

with .LCx being the strings, and no stack pointer involved!

For cpp_proc:

movl    $6, %edx
movl    $.LC4, %esi
movl    $_ZSt4cout, %edi
call    _ZSt16__ostream_insertIcSt11char_traits...basic_ostreamIT_T0_ES6_PKS3_l
movl    $6, %edx
movl    $.LC5, %esi
movl    $_ZSt4cout, %edi
call    _ZSt16__ostream_insertIcSt11char_traits...basic_ostreamIT_T0_ES6_PKS3_l
movl    $1, %edx
movl    $.LC0, %esi
movl    $_ZSt4cout, %edi
call    _ZSt16__ostream_insertIcSt11char_traits...basic_ostreamIT_T0_ES6_PKS3_l

You see now what this is all about.

Regards

rbo


The caller code is most of the time irrelevant to performance.

I guess the line 2 of the C++ code stores the address of std::cout as the implicit 'this' argument of the operator<< method.

and i might be wrong on the C part, but it seems to me that it is incomplete. the 32bit upper part of rax is not initialized in this snippet, it might be initialized earlier. (no, i'm wrong here).

from what i understand (i might be wrong), the problem with 64bit registers, is that most of the time they cannot be initialized by immediates, so you have to play with 32bit operations to get the desired result. so the compiler plays with 32bit registers to initialize the 64bit rdi register.

And it seems that printf takes the value of al (the LSB of eax) as an input that tells printf() how many xmm 128 registers are used as input. It looks like an optimization to be able to pass the input string into the xmm registers or some other funny business.


int printf( const char*, ...) is a variadic function that can take one or more arguments; whereas ostream& operator<< (ostream&, signed char*) takes exactly two. I believe that that accounts for the difference in instructions needed to invoke them.

Line 2 in the C++ disassembly is where it passes the ostream& (in this case cout). so the function knows what stream object it is outputting to.

Since both end up making a function call, the comparison is largely irrelevant; the code executed within the function call will be far more significant. The operator<< is overloaded for a number of right-hand-side types, and is resolved at compile time; printf() on the other hand must parse the format string at runtime to determine the data type so may incur additional overhead. Either way the amount of code executed within the functions will swamp the call overhead in terms of instructions executed, and will almost certainly be dominated by the OS code required to render the text on a graphical display. So in short you are sweating the small stuff.


movl is move long, 32-bit move

movq is move quad, 64-bit move

printf has a return value, either the number of characters written or -1 on failure, and that value is stored into %eax, that's all the extra line is worrying about.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜