Templated C++ Object Files
Lets say I have two .cpp files, file1.cpp and file2.cpp, which use std::vector<int>
. Suppose that file1.cpp has a int main(void)
. If I compiled both into file1.o and file2.o, and linked the two object files into an elf binary which 开发者_如何学CI can execute. I am compiling on a 32-bit Ubuntu Linux machine.
My question regards how the compiler and linker put together the symbols for the std::vector:
- When the linker makes my final binary, is there code duplication? Does the linker have one set of "templated" code for the code in f1.o that uses
std::vector
and another set ofstd::vector
code for the code that comprises f2.o?
I tried this for myself (I used g++ -g
) and I looked at my final executable disassembly, and I found the labels generated for the vector constructor and other methods were apparently random, although the code from f1.o appeared to have called the same constructor as the code from f2.o. I could not be sure, however.
If the linker does prevent the code duplication, how does it do it? Must it "know" what templates are? Does it always prevent code duplication regarding multiple uses of the same templated code across multiple object files?
It knows what the templates are through name mangling. The type of the object is encoded by the compiler in its name, and that allows the linker to filter out the duplicate implementations of the same template.
This is done during linking, and not compilation, because each .o file can be linked with anything thus cannot be stripped of something that may later be needed. Only the linker can decide which code is unused, which template is duplicate, etc. This is done by using "Weak Symbols" in the object's symbol list: Symbols that the linker can remove if they appear multiple times (as opposed to other symbols, like user-defined functions, that cannot be removed if duplicate and cause a linking error).
Your question is stated verbatim in the opening section of this documentation:
http://gcc.gnu.org/onlinedocs/gcc/Template-Instantiation.html
Technically due to the "one definition rule" there is only one std::vector<int>
and therefore the code should be linked together. What may happen is that some code is inlined which would speed up execution time but could produce more code.
If you had one file using std::vector<int>
and another using std::vector<unsigned int>
then you would have 2 classes and potentially lots of duplicate code.
Of course the writers of vector might use some common code for certain situations eg POD types that removes the duplication.
精彩评论