Tail call recursion
I'm implementing a function as following:
void Add(list* node)
{
if(this->next == NULL)
this->next = node;
else
this->next->Add(node);
}
As it seems Add
is going to be tail-called in every step of the recursion.
void Add(list *node)
{
list *curr = this;
while(c开发者_如何学JAVAurr->next != NULL) curr = curr->next;
curr->next = node;
}
This will not use recursion at all.
Which version of this is better? (in stack size or speed) Please don't give the "Why don't use STL/Boost/whatever?" comments/answers.They probably will be the same performance-wise, since the compiler will probably optimise them into the exact same code.
However, if you compile on Debug settings, the compiler will not optimise for tail-recursion, so if the list is long enough, you can get a stack overflow. There is also the (very small) possibility that a bad compiler won't optimise the recursive version for tail-recursion. There is no risk of that in the iterative version.
Pick whichever one is clearer and easier for you to maintain taking the possibility of non-optimisation into account.
I tried it out, making three files to test your code:
node.hh:
struct list {
list *next;
void Add(list *);
};
tail.cc:
#include "node.hh"
void list::Add(list* node)
{
if(!this->next)
this->next = node;
else
this->next->Add(node);
}
loop.cc:
#include "node.hh"
void list::Add(list *node)
{
list *curr = this;
while(curr->next) curr = curr->next;
curr->next = node;
}
Compiled both files with G++ 4.3 for IA32, with -O3
and -S
to give assembly output rather than object files
Results:
tail.s:
_ZN4list3AddEPS_:
.LFB0:
.cfi_startproc
.cfi_personality 0x0,__gxx_personality_v0
pushl %ebp
.cfi_def_cfa_offset 8
movl %esp, %ebp
.cfi_offset 5, -8
.cfi_def_cfa_register 5
movl 8(%ebp), %eax
.p2align 4,,7
.p2align 3
.L2:
movl %eax, %edx
movl (%eax), %eax
testl %eax, %eax
jne .L2
movl 12(%ebp), %eax
movl %eax, (%edx)
popl %ebp
ret
.cfi_endproc
loop.s:
_ZN4list3AddEPS_:
.LFB0:
.cfi_startproc
.cfi_personality 0x0,__gxx_personality_v0
pushl %ebp
.cfi_def_cfa_offset 8
movl %esp, %ebp
.cfi_offset 5, -8
.cfi_def_cfa_register 5
movl 8(%ebp), %edx
jmp .L3
.p2align 4,,7
.p2align 3
.L6:
movl %eax, %edx
.L3:
movl (%edx), %eax
testl %eax, %eax
jne .L6
movl 12(%ebp), %eax
movl %eax, (%edx)
popl %ebp
ret
.cfi_endproc
Conclusion: The output is substantially similar enough (the core loop/recursion becomes movl, movl, testl, jne
in both) that it really isn't worth worrying about. There's one less unconditional jump in the recursive version, although I wouldn't want to bet either way which is faster if it's even measurable at all. Pick which ever is most natural to express the algorithm at hand. Even if you later decide that was a bad choice it's not too hard to switch.
Adding -g
to the compilation doesn't change the actual implementation with g++ either, although there is the added complication that setting break points no longer behaves like you would expect it to - break points on the tail call line gets hit at most once (but not at all for a 1 element list) in my tests with GDB, regardless of how deep the recursion actually goes.
Timings:
Out of curiosity I ran some timings with the same variant of g++. I used:
#include <cstring>
#include "node.hh"
static const unsigned int size = 2048;
static const unsigned int count = 10000;
int main() {
list nodes[size];
for (unsigned int i = 0; i < count; ++i) {
std::memset(nodes, 0, sizeof(nodes));
for (unsigned int j = 1; j < size; ++j) {
nodes[0].Add(&nodes[j]);
}
}
}
This was run 200 times, with each of the loop and the tail call versions. The results with this compiler on this platform were fairly conclusive. Tail had a mean of 40.52 seconds whereas lop had a mean of 66.93. (The standard deviations were 0.45 and 0.47 respectively).
So I certainly wouldn't be scared of using tail call recursion if it seems the nicer way of expressing the algorithm, but I probably wouldn't go out of my way to use it either, since I suspect that these timing observations would most likely vary from platform/compiler (versions).
精彩评论