Difference in linkage between C and C++?
I have read the existing questions on external/internal linkage over here on SO. My question is different - what happens if I have multiple definitions of the same variable with external linkage in different translation units under C
and C++
?
For example:
/*file1.c*/
typedef struct foo {
int a;
int b;
int c;
} foo;
foo xyz;
/*file2.c*/
typedef struct abc {
double x;
} foo;
foo xyz;
Using Dev-C++ and as a C program, the above program compiles and links perfectly; whereas it gives a multiple redefinition error if the s开发者_JAVA技巧ame is compiled as a C++ program. Why should it work under C and what's the difference with C++? Is this behavior undefined and compiler-dependent? How "bad" is this code and what should I do if I want to refactor it (i've come across a lot of old code written like this)?
Both C and C++ have a "one definition rule" which is that each object may only be defined once in any program. Violations of this rule cause undefined behaviour which means that you may or may not see a diagnostic message when compiling.
There is a language difference between the following declarations at file scope, but it does not directly concern the problem with your example.
int a;
In C this is a tentative definition. It may be amalgamated with other tentative definitions in the same translation unit to form a single definition. In C++ it is always a definition (you have to use extern
to declare an object without defining it) and any subsequent definitions of the same object in the same translation unit are an error.
In your example both translation units have a (conflicting) definition of xyz
from their tentative definitions.
This is caused by C++'s name mangling. From Wikipedia:
The first C++ compilers were implemented as translators to C source code, which would then be compiled by a C compiler to object code; because of this, symbol names had to conform to C identifier rules. Even later, with the emergence of compilers which produced machine code or assembly directly, the system's linker generally did not support C++ symbols, and mangling was still required.
With regards to compatibility:
In order to give compiler vendors greater freedom, the C++ standards committee decided not to dictate the implementation of name mangling, exception handling, and other implementation-specific features. The downside of this decision is that object code produced by different compilers is expected to be incompatible. There are, however, third party standards for particular machines or operating systems which attempt to standardize compilers on those platforms (for example C++ ABI[18]); some compilers adopt a secondary standard for these items.
From http://www.cs.indiana.edu/~welu/notes/node36.html the following example is given:
For example for the below C code
int foo(double*);
double bar(int, double*);
int foo (double* d)
{
return 1;
}
double bar (int i, double* d)
{
return 0.9;
}
Its symbol table would be (by dump -t
)
[4] 0x18 44 2 1 0 0x2 bar
[5] 0x0 24 2 1 0 0x2 foo
For same file, if compile in g++, then the symbol table would be
[4] 0x0 24 2 1 0 0x2 _Z3fooPd
[5] 0x18 44 2 1 0 0x2 _Z3bariPd
_Z3bariPd
means a function whose name is bar and whose first arg is integer and second argument is pointer to double.
C++ does not allow a symbol to be defined more than once. Not sure what the C linker is doing, a good guess might be that it simply maps both definitions onto the same symbol, which would of course cause severe errors.
For porting I would try to put the contents of individual C-files into anonymous namespaces, which essentially makes the symbols different, and local to the file, so they don't clash with the same name elsewhere.
The C program permits this and treats the memory a little like a union. It will run, but may not give you what you expected.
The C++ program (which is stronger typed) correctly detects the problem and asks you to fix it. If you really want a union, declare it as one. If you want two distinct objects, limit their scope.
You have found the One Definition Rule. Clearly your program has a bug, since
- There can only be one object named
foo
once the program is linked. - If some source file includes all the header files, it will see two definitions of
foo
.
C++ compilers can get around #1 because of "name mangling": the name of your variable in the linked program may be different from the one you chose. In this case, it isn't required, but it's probably how your compiler detected the problem. #2, though, remains, so you can't do that.
If you really want to defeat the safety mechanism, you can disable mangling like this:
extern "C" struct abc foo;
… other file …
extern "C" struct foo foo;
extern "C"
instructs the linker to use C ABI conventions.
精彩评论