Why and how does GCC compile a function with a missing return statement?

2023-04-01 18:03 问答作者：

Consider:

#include <stdio.h>

char toUpper(char);

int main(void)
{
    char ch, ch2;
    printf("lowercase input: ");
    ch = getchar();
    ch2 = toUpper(ch);
    printf("%c ==> %c\n", ch, ch2);
    
    return 0;
}

char toUpper(char c)
{
    if(c>='a' && c<='z')
        c = c - 32;
}

In the toUpper function, the return type is char, but there isn't any &q开发者_开发百科uot;return" in toUpper(). And compile the source code with gcc (GCC) 4.5.1 20100924 (Red Hat 4.5.1-4), Fedora 14.

Of course, a warning is issued: "warning: control reaches end of non-void function", but, working well.

What has happened in that code during compile with gcc?

When the C program was compiled into assembly language, your toUpper function ended up like this, perhaps:

_toUpper:
LFB4:
        pushq   %rbp
LCFI3:
        movq    %rsp, %rbp
LCFI4:
        movb    %dil, -4(%rbp)
        cmpb    $96, -4(%rbp)
        jle     L8
        cmpb    $122, -4(%rbp)
        jg      L8
        movzbl  -4(%rbp), %eax
        subl    $32, %eax
        movb    %al, -4(%rbp)
L8:
        leave
        ret

The subtraction of 32 was carried out in the %eax register. And in the x86 calling convention, that is the register in which the return value is expected to be! So... you got lucky.

But please pay attention to the warnings. They are there for a reason!

It depends on the Application Binary Interface and which registers are used for the computation.

E.g. on x86, the first function parameter and the return value is stored in EAX and so gcc is most likely using this to store the result of the calculation as well.

Essentially, c is pushed into the spot that should later be filled with the return value; since it's not overwritten by use of return, it ends up as the value returned.

Note that relying on this (in C, or any other language where this isn't an explicit language feature, like Perl), is a Bad Idea™. In the extreme.

One missing thing that's important to understand is that it's rarely a diagnosable error to omit a return statement. Consider this function:

int f(int x)
{
    if (x!=42) return x*x;
}

As long as you never call it with an argument of 42, a program containing this function is perfectly valid C and does not invoke any undefined behavior, despite the fact that it would invoke UB if you called f(42) and subsequently attempted to use the return value.

As such, while it's possible for a compiler to provide warning heuristics for missing return statements, it's impossible to do so without false positives or false negatives. This is a consequence of the impossibility of solving the halting problem.

I can't tell you the specifics of your platform as I don't know it, but there is a general answer to the behaviour you see.

When some function that has a return is compiled, the compiler will use a convention on how to return that data. It could be a machine register, or a defined memory location such as via a stack or whatever (though generally machine registers are used). The compiled code may also use that location (register or otherwise) while doing the work of the function.

If the function doesn't return anything, then the compiler will not generate code that explicitly fills that location with a return value. However, like I said above, it may use that location during the function. When you write code that reads the return value (ch2 = toUpper(ch);), the compiler will write code that uses its convention on how retrieve that return from the conventional location. As far as the caller code is concerned, it will just read that value from the location, even if nothing was written explicitly there. Hence you get a value.

Now look at Ray's example. The compiler used the EAX register to store the results of the upper casing operation. It just so happens, this is probably the location that return values are written to. On the calling side, ch2 is loaded with the value that's in EAX - hence a phantom return. This is only true of the x86 range of processors, as on other architectures the compiler may use a completely different scheme in deciding how the convention should be organised.

However, good compilers will try optimise according to a set of local conditions, knowledge of code, rules, and heuristics. So an important thing to note is that this is just luck that it works. The compiler could optimise and not do this or whatever - you should not reply on the behaviour.

You should keep in mind that such code may crash depending on the compiler. For example, Clang generates a ud2 instruction at the end of such function and your app will crash at run time.

There are no local variables, so the value on the top of the stack at the end of the function will be the parameter c. The value at the top of the stack upon exiting, is the return value. So whatever c holds, that's the return value.

I have tried a small program:

#include <stdio.h>

int f1() {
}

int main() {
    printf("TEST: <%d>\n",  f1());
    printf("TEST: <%d>\n",  f1());
    printf("TEST: <%d>\n",  f1());
    printf("TEST: <%d>\n",  f1());
    printf("TEST: <%d>\n",  f1());
}

Result:

TEST: <1>

TEST: <10>

TEST: <11>

I have used the MinGW-GCC compiler, so there might be differences.

You could just play around and try, e.g., a char function. As long you don't use the result value, it will still work fine.

#include <stdio.h>

char f1() {
}

int main() {
    f1();
}

But I still would recommend to set either void function or give some return value.

Your function seems to need a return:

char toUpper(char c)
{
    if(c>='a'&&c<='z')
        c = c - 32;
    return c;
}

继续阅读：c gcc

Why and how does GCC compile a function with a missing return statement?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？