Why isn't g++ tail call optimizing while gcc is?
I wanted to check whether g++ supports tail calling so I wrote this simple program to check it: http://ideone.com/hnXHv
using namespace std;
size_t st;
void PrintStackTop(const std::string &type)
{
int stack_top;
if(st == 0) st = (size_t) &stack_top;
cout << "In " << type << " call version, the stack top is: " << (st - (size_t) &stack_top) << endl;
}
int TailCallFactorial(int n, int a = 1)
{
PrintStackTop("tail");
if(n < 2)
return a;
return TailCallFactorial(n - 1, n * a);
}
int NormalCallFactorial(int n)
{
PrintStackTop("normal");
if(n < 2)
return 1;
return NormalCallFactorial(n - 1) * n;
}
int main(int argc, char *argv[])
{
st = 0;
cout << TailCallFactorial(5) << endl;
st = 0;
cout << NormalCallFactorial(5) << endl;
return 0;
}
When I compiled it normally it seems g++ doesn't really notice any difference between the two versions:
> g++ main.cpp -o TailCall
> ./TailCall
In tail call version, the stack top is: 0
In tail call version, the stack top is: 48
In tail call version, the stack top is: 96
In tail call version, the stack top is: 144
In tail call version, the stack top is: 192
120
In normal call version, the stack top is: 0
In normal call version, the stack top is: 48
In normal call version, the stack top is: 96
In normal call version, the stack top is: 144
In normal call version, the stack top is: 192
120
The stack difference is 48 in both of them, while the tail call version needs one more int. (Why?)
So I thought optimization might be handy:> g++ -O2 main.cpp -o TailCall
> ./TailCall
In tail call version, the stack top is: 0
In tail call version, the stack top is: 80
In tail call version, the stack top is: 160
In tail call version, the stack top is: 240
In tail call version, the stack top is: 320
120
In normal call version, the stack top is: 0
In normal call version, the stack top is: 64
In normal call version, the stack top is: 128
In normal call version, the stack top is: 192
In normal call version, the stack top is: 256
120
The stack size increased in both cases, and while the compiler might think my CPU is slower than my memory (which its not anyway), I don't know why 80 bytes are necessary for a simple function. (Why is it?).
There tail call version also takes more space than the normal version, and its completely logical if an int has the size of 16 bytes. (no, I don't own a 128 bit CPU). Now thinking what reason the compiler has not to tail call, I thought it might be exceptions, because they depend on the stack tightly. So I tried without exceptions:> g++ -O2 -fno-exceptions main.cpp -o TailCall
> ./TailCall
In tail call version, the stack top is: 0
In tail call version, the stack top is: 64
In tail call version, the stack top is: 128
In tail call version, the stack top is: 192
In tail call version, the stack top is: 256
120
In normal call version, the stack top is: 0
In normal call version, the stack top is: 48
In normal call version, the stack top is: 96
In normal call version, the stack top is: 144
In normal call version, the stack top is: 192
120
Which cut the normal version back to non-optimized stack size, while the optimized one has 8 bytes over it. still an int is not 8 bytes.
I thought there is something I missed in c++ that needs the stack arranged so I tried c: http://ideone.com/tJPpc Still no tail calling, but the stack is much smaller (32 bit each frame in both version). Then I tried with optimization:> gcc -O2 main.c -o TailCall
> ./TailCall
In tail call version, the stack top is: 0
In tail call version, the stack top is: 0
In tail call version, the stack top is: 0
In tail call version, the stack top is: 0
In tail call version, the stack top is: 0
120
In normal call version, the stack top i开发者_如何学Gos: 0
In normal call version, the stack top is: 0
In normal call version, the stack top is: 0
In normal call version, the stack top is: 0
In normal call version, the stack top is: 0
120
Not only it tail call optimized the first, it also tail call optimized the second!
Why doesn't g++ do tail call optimization while its clearly available in the platform? is there any way to force it?Because you're passing a temporary std::string object to the PrintStackTop(std::string) function. This object is allocated on the stack and thus prevent the tail call optimization.
I modified your code:
void PrintStackTopStr(char const*const type)
{
int stack_top;
if(st == 0) st = (size_t) &stack_top;
cout << "In " << type << " call version, the stack top is: " << (st - (size_t) &stack_top) << endl;
}
int RealTailCallFactorial(int n, int a = 1)
{
PrintStackTopStr("tail");
if(n < 2)
return a;
return RealTailCallFactorial(n - 1, n * a);
}
Compile with: g++ -O2 -fno-exceptions -o tailcall tailcall.cpp
And it now uses the tail call optimisation. You can see it in action if you use the -S flag to produce the assembly:
L39:
imull %ebx, %esi
subl $1, %ebx
L38:
movl $LC2, (%esp)
call __Z16PrintStackTopStrPKc
cmpl $1, %ebx
jg L39
You see the recursive call inlined as a loop (jg L39).
I don't find the other answer satisfying because a local object has no effect on the stack once it's gone.
Here is a good article which mentions that the lifetime of local objects extends into the tail-called function. Tail call optimization requires destroying locals before relinquishing control, GCC will not apply it unless it is sure that no local object will be accessed by the tail call.
Lifetime analysis is hard though, and it looks like it's being done too conservatively. Setting a global pointer to reference a local disables TCO even if the local's lifetime (scope) ends before the tail call.
{
int x;
static int * p;
p = & x;
} // x is dead here, but the enclosing function still has TCO disabled.
This still doesn't seem to model what's happening, so I found another bug. Passing local to a parameter with a user-defined or non-trivial destructor also disables TCO. (Defining the destructor = delete
allows TCO.)
std::string
has a nontrivial destructor, so that's causing the issue here.
The workaround is to do these things in a nested function call, because lifetime analysis will then be able to tell that the object is dead by the tail call. But there's no need to forgo all C++ objects.
The original code with temporary std::string
object is still tail recursive, since the destructor for that object is executed immediately after exit from PrintStackTop("");
, so nothing should be executed after the recursive return
statement.
However, there are two issues that lead to confusion of tail call optimization (TCO):
- the argument is passed by reference to the
PrintStackTop
function - non-trivial destructor of std::string
It can be verified by custom class that each of those two issues is able to break TCO.
As it is noted in the previous answer by @Potatoswatter there is a workaround for both of those issues. It is enough to wrap call of PrintStackTop
by another function to help the compiler to perform TCO even with temporary std::string
:
void PrintStackTopTail()
{
PrintStackTop("tail");
}
int TailCallFactorial(int n, int a = 1)
{
PrintStackTopTail();
//...
}
Note that is not enough to limit the scope by enclosing { PrintStackTop("tail"); }
in curly braces. It must be enclosed as a separate function.
Now it can be verified with g++ version 4.7.2 (compilation options -O2) that tail recursion is replaced by a loop.
The similar issue is observed in Pass-by-reference hinders gcc from tail call elimination
Note that printing (st - (size_t) &stack_top)
is not enough to be sure that TCO is performed, for example with the optimization option -O3 the function TailCallFactorial
is self inlined five times, so TailCallFactorial(5)
is executed as a single function call, but the issue is revealed for larger argument values (for example for TailCallFactorial(15);
). So, the TCO may be verified by reviewing assembly output generated with the -S flag.
精彩评论