using virtual functions versus static_cast from base to derived
I am trying to understand which implementation below is "faster". Assume that one compiles this code with and without the -DVIRTUAL flag.
I assume that compiling without -DVIRTUAL will be faster because:
a] There is no vtable used
b] The compiler might be able to optimize the assembly instructions because it "knows" exactly which call will be made given the various options (there are only a finite number of options).
My question is PURELY related to speed, not pretty code.
a] Am I correct in my analysis above?
b] Will the branch predictor / compiler combination be intelligent enough开发者_如何学C to optimize for a given branch of the switch statement? See that the "type" is a const int.
c] Are there any other factors that I am missing?
Thanks!
#include <iostream>
class Base
{
public:
Base(int t) : type(t) {}
~Base() {}
const int type;
#ifdef VIRTUAL
virtual void fn1()=0;
#else
void fn2();
#endif
};
class Derived1 : public Base
{
public:
Derived1() : Base(1) { }
~Derived1() {}
void fn1() { std::cout << "in Derived1()" << std::endl; }
};
class Derived2 : public Base
{
public:
Derived2() : Base(2) { }
~Derived2() { }
void fn1() { std::cout << "in Derived2()" << std::endl; }
};
#ifndef VIRTUAL
void Base::fn2()
{
switch(type)
{
case 1:
(static_cast<Derived1* const>(this))->fn1();
break;
case 2:
(static_cast<Derived2* const>(this))->fn1();
break;
default:
break;
};
}
#endif
int main()
{
Base *test = new Derived1();
#ifdef VIRTUAL
test->fn1();
#else
test->fn2();
#endif
return 0;
}
I think you misunderstand the VTable. The VTable is simply a jump table (In most implementations though AFAIK the spec does not guarantee this!). In fact you could go as far as saying its a giant switch statement. As such I'd wager the speed would be exactly the same with both your methods.
If anything I'd imagine the VTable method would be slightly faster as the compiler can make better decisions to optimise for cache alignment and so forth...
Have you measured the performance to see if there's even any difference at all?
I suppose not, because then you wouldn't be asking here. It's the only reasonable response though.
Assuming that you are not prematurely micro-optimizing pointlessly, and you have profiled your code and found this to be a problem that needs solving, the best way to figure out the answer to your question is to compile both in release with full optimizations and examine the generated machine code.
It's impossible to answer without specifying compiler and compiler options.
I see no particular reason why your non-virtual code should necessarily be any faster to make the call than the virtual code. In fact, the switch might well be slower than a vtable, since a call using a vtable will load an address and jump to it, whereas the switch will load an integer and do a little bit of thinking. Either one of them could be faster. For obvious reasons, a virtual call is not specified by the standard to be "slower than any other thing you invent to replace it".
I think it's reasonably unlikely that a randomly-chosen compiler will actually inline the call in the virtual case, but it's certainly allowed to (under the as-if rule), since the dynamic type of *test
could be determined by data-flow analysis or similar. I think it's reasonably likely that with optimization enabled a randomly-chosen compiler will inline everything in the non-virtual case. But then, you've given a small example with very short functions all in one TU, so inlining is especially easy.
It depends on the platform and the compiler. A switch
statement can be implemented as a test and branch or a jump table (i.e., an indirect branch). A virtual
function is usually implemented as an indirect branch. If your compiler turns the switch
statement into a jump table, the two approaches differ by one additional dereference. If that is the case and this particular usage happens infrequently enough (or thrashes the cache enough) then you might see a difference due to an extra cache miss.
On the other hand, if the switch
statement is simply a test and branch, you might see a much bigger performance difference on some in-order CPUs that flush the instruction cache on an indirect branch (or require a high latency between setting the destination of an indirect branch and jumping to it).
If you are really concerned with the overhead of virtual function dispatch, say, for an inner loop over a heterogenous collection of objects, you might want to reconsider where you perform the dynamic dispatch. It doesn't have to be per object; it could also be per known groupings of objects with the same type.
It is not necessarily true that avoiding vtables will be faster - to be sure, you should measure yourself.
Note that:
- The
static_cast
version may introduce a branch (likely not to, if it gets optimized to a jump table), - The
vtable
version on all implementations I know will result in a jump table.
See a pattern here?
Generally, you'd prefer linear time lookup, not branching the code, so the virtual function method seems to be better.
精彩评论