Assembly versus binary output
Is it better for a compiler to compile code to assembly language, or output binary machine code directly?
Advantages of assembly language that I can think of off the top of my head: avoiding the need to learn the object file format, ease of debugging the backend.
Advantage of binary: faster compile speed. How significant is this? Assuming the Gnu assembler is used (apart from anything else, it's what can reasonably be assumed to be available on most machines), does it take a signif开发者_运维技巧icant amount of time to assemble, say, a million lines of code?
Are there any differences in object file formats between various operating systems that the assembler would hide?
Are there any other advantages on either side that I haven't thought of?
Assembly is easier to output and it has the benefit of being human readable. As for time of compilation, here are some statistics from my compiler:
[~/ecc/ellcc/ecc/Main] main% ../../bin/x86-elf-ecc test/sieve.c -time-actions
===-------------------------------------------------------------------------===
... Ellcc action timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 2.9006 seconds (2.9857 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
2.0397 ( 71.3%) 0.0250 ( 65.8%) 2.0647 ( 71.2%) 2.1174 ( 70.9%) Bitcode linking
0.7999 ( 27.9%) 0.0070 ( 18.4%) 0.8069 ( 27.8%) 0.8111 ( 27.2%) Generating
0.0000 ( 0.0%) 0.0010 ( 2.6%) 0.0010 ( 0.0%) 0.0274 ( 0.9%) Assembly
0.0110 ( 0.4%) 0.0030 ( 7.9%) 0.0140 ( 0.5%) 0.0143 ( 0.5%) LLVM generation
0.0070 ( 0.2%) 0.0000 ( 0.0%) 0.0070 ( 0.2%) 0.0066 ( 0.2%) Type checking
0.0000 ( 0.0%) 0.0020 ( 5.3%) 0.0020 ( 0.1%) 0.0041 ( 0.1%) Linking
0.0030 ( 0.1%) 0.0000 ( 0.0%) 0.0030 ( 0.1%) 0.0031 ( 0.1%) Optimization
0.0010 ( 0.0%) 0.0000 ( 0.0%) 0.0010 ( 0.0%) 0.0010 ( 0.0%) Elaboration
0.0010 ( 0.0%) 0.0000 ( 0.0%) 0.0010 ( 0.0%) 0.0004 ( 0.0%) Integrity checking
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) Parsing
2.8626 (100.0%) 0.0380 (100.0%) 2.9006 (100.0%) 2.9857 (100.0%) TOTAL
[~/ecc/ellcc/ecc/Main] main%
As you can see the assembly time is dwarfed by linking and code generation. This example compiles and links together a small main() along with the standard library, all in LLVM intermediate form. A single assembly language file is then generated for the whole program. This file is linked (actually relocated) using the linker, which creates the a.out file.
Another advantage of assembly: Ability to use labels for jumps, loops, branches and function calls so you don't need to manually calculate memory addresses.
If you generate assembler code, then you will end up
- writing that code to disk, and the assembler will have to read it;
- calling the assembler, i.e. starting up a new process.
The assembler itself runs quickly, but the file I/O will take a moment or two. A million lines? Maybe 5 seconds. Firing up the assembler will take, say, 100 to 1000 ms. No big deal there.
I think the greater ease of debugging and lifting the need to fiddle with the object format will easily make up for the slightly longer compilation times.
The main advantage of generating binary directly is that you can spray your code straight into memory, flush the I-cache, and then branch to it. This means you can create a nice interactive loop using your native-code compiler. A nice feature to have, and deployed for over 20 years in the Standard ML of New Jersey compiler.
Are there any differences in object file formats between various operating systems that the assembler would hide?
Yes, even on the same operating system, you can have several object file formats. (For instance, MASM can generate e.g. OMF or COFF object formats to be used by different linkers.)
More on the different object file formats can be found in the respective section in this document.
you can try out how long it takes to generate assembly for your code:
gcc -O2 -S -c foo.c
精彩评论