Why and how does GCC compile a function with a missing return statement?
Consider:
#include <stdio.h>
char toUpper(char);
int main(void)
{
char ch, ch2;
printf("lowercase input: ");
ch = getchar();
ch2 = toUpper(ch);
printf("%c ==> %c\n", ch, ch2);
return 0;
}
char toUpper(char c)
{
if(c>='a' && c<='z')
c = c - 32;
}
In the toUpper function, the return type is char, but there isn't any &q开发者_开发百科uot;return" in toUpper(). And compile the source code with gcc (GCC) 4.5.1 20100924 (Red Hat 4.5.1-4), Fedora 14.
Of course, a warning is issued: "warning: control reaches end of non-void function", but, working well.
What has happened in that code during compile with gcc?
When the C program was compiled into assembly language, your toUpper function ended up like this, perhaps:
_toUpper:
LFB4:
pushq %rbp
LCFI3:
movq %rsp, %rbp
LCFI4:
movb %dil, -4(%rbp)
cmpb $96, -4(%rbp)
jle L8
cmpb $122, -4(%rbp)
jg L8
movzbl -4(%rbp), %eax
subl $32, %eax
movb %al, -4(%rbp)
L8:
leave
ret
The subtraction of 32 was carried out in the %eax register. And in the x86 calling convention, that is the register in which the return value is expected to be! So... you got lucky.
But please pay attention to the warnings. They are there for a reason!
It depends on the Application Binary Interface and which registers are used for the computation.
E.g. on x86, the first function parameter and the return value is stored in EAX
and so gcc is most likely using this to store the result of the calculation as well.
Essentially, c
is pushed into the spot that should later be filled with the return value; since it's not overwritten by use of return
, it ends up as the value returned.
Note that relying on this (in C, or any other language where this isn't an explicit language feature, like Perl), is a Bad Idea™. In the extreme.
One missing thing that's important to understand is that it's rarely a diagnosable error to omit a return statement. Consider this function:
int f(int x)
{
if (x!=42) return x*x;
}
As long as you never call it with an argument of 42, a program containing this function is perfectly valid C and does not invoke any undefined behavior, despite the fact that it would invoke UB if you called f(42)
and subsequently attempted to use the return value.
As such, while it's possible for a compiler to provide warning heuristics for missing return statements, it's impossible to do so without false positives or false negatives. This is a consequence of the impossibility of solving the halting problem.
I can't tell you the specifics of your platform as I don't know it, but there is a general answer to the behaviour you see.
When some function that has a return is compiled, the compiler will use a convention on how to return that data. It could be a machine register, or a defined memory location such as via a stack or whatever (though generally machine registers are used). The compiled code may also use that location (register or otherwise) while doing the work of the function.
If the function doesn't return anything, then the compiler will not generate code that explicitly fills that location with a return value. However, like I said above, it may use that location during the function. When you write code that reads the return value (ch2 = toUpper(ch);)
, the compiler will write code that uses its convention on how retrieve that return from the conventional location. As far as the caller code is concerned, it will just read that value from the location, even if nothing was written explicitly there. Hence you get a value.
Now look at Ray's example. The compiler used the EAX register to store the results of the upper casing operation. It just so happens, this is probably the location that return values are written to. On the calling side, ch2 is loaded with the value that's in EAX - hence a phantom return. This is only true of the x86 range of processors, as on other architectures the compiler may use a completely different scheme in deciding how the convention should be organised.
However, good compilers will try optimise according to a set of local conditions, knowledge of code, rules, and heuristics. So an important thing to note is that this is just luck that it works. The compiler could optimise and not do this or whatever - you should not reply on the behaviour.
You should keep in mind that such code may crash depending on the compiler. For example, Clang generates a ud2 instruction at the end of such function and your app will crash at run time.
There are no local variables, so the value on the top of the stack at the end of the function will be the parameter c. The value at the top of the stack upon exiting, is the return value. So whatever c holds, that's the return value.
I have tried a small program:
#include <stdio.h>
int f1() {
}
int main() {
printf("TEST: <%d>\n", f1());
printf("TEST: <%d>\n", f1());
printf("TEST: <%d>\n", f1());
printf("TEST: <%d>\n", f1());
printf("TEST: <%d>\n", f1());
}
Result:
TEST: <1>
TEST: <10>
TEST: <11>
TEST: <11>
TEST: <11>
I have used the MinGW-GCC compiler, so there might be differences.
You could just play around and try, e.g., a char function. As long you don't use the result value, it will still work fine.
#include <stdio.h>
char f1() {
}
int main() {
f1();
}
But I still would recommend to set either void function or give some return value.
Your function seems to need a return:
char toUpper(char c)
{
if(c>='a'&&c<='z')
c = c - 32;
return c;
}
精彩评论