开发者

Tail call recursion

I'm implementing a function as following:

void Add(list* node)
{
    if(this->next == NULL)
        this->next = node;
    else
        this->next->Add(node);
}

As it seems Add is going to be tail-called in every step of the recursion.

I could also implement it as:

void Add(list *node)
{
    list *curr = this;
    while(c开发者_如何学JAVAurr->next != NULL) curr = curr->next;
    curr->next = node;
}

This will not use recursion at all.

Which version of this is better? (in stack size or speed)

Please don't give the "Why don't use STL/Boost/whatever?" comments/answers.


They probably will be the same performance-wise, since the compiler will probably optimise them into the exact same code.

However, if you compile on Debug settings, the compiler will not optimise for tail-recursion, so if the list is long enough, you can get a stack overflow. There is also the (very small) possibility that a bad compiler won't optimise the recursive version for tail-recursion. There is no risk of that in the iterative version.

Pick whichever one is clearer and easier for you to maintain taking the possibility of non-optimisation into account.


I tried it out, making three files to test your code:

node.hh:

struct list {
  list *next;
  void Add(list *);
};

tail.cc:

#include "node.hh"

void list::Add(list* node)
{
    if(!this->next)
        this->next = node;
    else
        this->next->Add(node);
}

loop.cc:

#include "node.hh"

void list::Add(list *node)
{
    list *curr = this;
    while(curr->next) curr = curr->next;
    curr->next = node;
}

Compiled both files with G++ 4.3 for IA32, with -O3 and -S to give assembly output rather than object files

Results:

tail.s:

_ZN4list3AddEPS_:
.LFB0:
        .cfi_startproc
        .cfi_personality 0x0,__gxx_personality_v0
        pushl   %ebp
        .cfi_def_cfa_offset 8
        movl    %esp, %ebp
        .cfi_offset 5, -8
        .cfi_def_cfa_register 5
        movl    8(%ebp), %eax
        .p2align 4,,7
        .p2align 3
.L2:
        movl    %eax, %edx
        movl    (%eax), %eax
        testl   %eax, %eax
        jne     .L2
        movl    12(%ebp), %eax
        movl    %eax, (%edx)
        popl    %ebp
        ret
        .cfi_endproc

loop.s:

_ZN4list3AddEPS_:
.LFB0:
        .cfi_startproc
        .cfi_personality 0x0,__gxx_personality_v0
        pushl   %ebp
        .cfi_def_cfa_offset 8
        movl    %esp, %ebp
        .cfi_offset 5, -8
        .cfi_def_cfa_register 5
        movl    8(%ebp), %edx
        jmp     .L3
        .p2align 4,,7
        .p2align 3
.L6:
        movl    %eax, %edx
.L3:
        movl    (%edx), %eax
        testl   %eax, %eax
        jne     .L6
        movl    12(%ebp), %eax
        movl    %eax, (%edx)
        popl    %ebp
        ret
        .cfi_endproc

Conclusion: The output is substantially similar enough (the core loop/recursion becomes movl, movl, testl, jne in both) that it really isn't worth worrying about. There's one less unconditional jump in the recursive version, although I wouldn't want to bet either way which is faster if it's even measurable at all. Pick which ever is most natural to express the algorithm at hand. Even if you later decide that was a bad choice it's not too hard to switch.

Adding -g to the compilation doesn't change the actual implementation with g++ either, although there is the added complication that setting break points no longer behaves like you would expect it to - break points on the tail call line gets hit at most once (but not at all for a 1 element list) in my tests with GDB, regardless of how deep the recursion actually goes.

Timings:

Out of curiosity I ran some timings with the same variant of g++. I used:

#include <cstring>
#include "node.hh"

static const unsigned int size = 2048;
static const unsigned int count = 10000;

int main() {
   list nodes[size];
   for (unsigned int i = 0; i < count; ++i) {
      std::memset(nodes, 0, sizeof(nodes));
      for (unsigned int j = 1; j < size; ++j) {
        nodes[0].Add(&nodes[j]);
      }
   }
}

This was run 200 times, with each of the loop and the tail call versions. The results with this compiler on this platform were fairly conclusive. Tail had a mean of 40.52 seconds whereas lop had a mean of 66.93. (The standard deviations were 0.45 and 0.47 respectively).

Tail call recursion

So I certainly wouldn't be scared of using tail call recursion if it seems the nicer way of expressing the algorithm, but I probably wouldn't go out of my way to use it either, since I suspect that these timing observations would most likely vary from platform/compiler (versions).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜