开发者

What is the reason function names are prefixed with an underscore by the compiler?

When I see the assembly code of a C app, like this:

emacs hello.c
clang -S -O hello.c -o hello.s
cat hello.s

Function names are prefixed with an开发者_运维知识库 underscore (e.g. callq _printf). Why is this done and what advantages does it have?


Example:

hello.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


int main() {
  char *myString = malloc(strlen("Hello, World!") + 1);
  memcpy(myString, "Hello, World!", strlen("Hello, World!") + 1);
  printf("%s", myString);
  return 0;
}

hello.s

_main:                       ; Here
Leh_func_begin0:
    pushq   %rbp
Ltmp0:
    movq    %rsp, %rbp
Ltmp1:
    movl    $14, %edi
    callq   _malloc          ; Here
    movabsq $6278066737626506568, %rcx
    movq    %rcx, (%rax)
    movw    $33, 12(%rax)
    movl    $1684828783, 8(%rax)
    leaq    L_.str1(%rip), %rdi
    movq    %rax, %rsi
    xorb    %al, %al
    callq   _printf          ; Here
    xorl    %eax, %eax
    popq    %rbp
    ret
Leh_func_end0:


From Linkers and Loaders:

At the time that UNIX was rewritten in C in about 1974, its authors already had extensive assember language libraries, and it was easier to mangle the names of new C and C-compatible code than to go back and fix all the existing code. Now, 20 years later, the assembler code has all been rewritten five times, and UNIX C compilers, particularly ones that create COFF and ELF object files, no longer prepend the underscore.

Prepending an underscore in the assembly results of C compilation is just a name-mangling convention that arose as a workaround. It stuck around for (as far as I know) no particular reason, and has now made its way into Clang.

Outside of assembly, the C standard library often has implementation-defined functions prefixed with an underscore to convey notions of magicalness and don't touch this to the ordinary programmers that stumble across them.


A lot of compilers used to translate C to assembly language, and then run an assembler on that to generate an object file. It's a lot easier than generating binary code directly. (AFAIK GCC still does this. But it also has its own assembler.) During this translation, function names become labels in the assembly source. If you have a function called (for example) ret, though, some assemblers can get confused and think it's an instruction rather than a label. (YASM does, for example, mostly because labels can appear pretty much anywhere and don't require colons. You have to prepend a $ if you want a label called ret.)

Prepending a character (like, say, an underscore) to the C-generated labels was a whole lot easier than writing one's own C-friendly assembler or worrying about labels clashing with assembly instructions/directives.

These days, assemblers and compilers have evolved a bit, and most people work at the C level or higher anyway. So the original need to mangle names in C is largely gone.


At first glance the operating system is a Unix/Unix-like running on a PC. According to me, there is nothing much surprising to find _printf in the generated assembly language. C printf is a function which performs an I/O. So it is the responsibility of the kernel + driver to perform the requested I/O.

The machine instructions path taken on any Unix/Unix-like OS is the following:

printf (C code)-> _printf (libc) -> trap -> kernel + driver work -> return from trap -> return from _printf (libc) -> printf completion and return -> next machine instruction in C code

In the case of this assembly code extract, it looks like the C printf is inlined by the compilateur which caused the _printf entry point to be visible in the assembly code.

To make sure the C printf does not get decorated with a prefix (an underscore in this case), best if searching in all C headers for a _printf with a command like:

find /usr/include -name *.h -exec grep _printf {} \; -print

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜