Virtual Function Compared to Pointer Casting
The current version of some code I'm using utilises a slightly odd way of acheiving something which I think could be acheived with polymorphism. More concretely we currently use something like
for(int i=0; i<CObjList.size(); ++i)
{
CObj* W = CObjList[i];
if( W->type == someTypeA )
{
// do some things which also involve casts such as
// ((SomeClassA*) W->objectptr)->someFieldA
开发者_运维百科 }
else if( W->type == someTypeB )
{
// do some things which also involve casting such as
// ((SomeClassB*) W->objectptr)->someFieldB
}
}
To clarify; each object W contains a void *objectptr;
that is to say a pointer to an arbitrary location. The field W->type
keeps track of what type of object objectptr
points at so that inside our if/else statements we can cast W->objectptr
to the correct type and use it's fields.
However, this seems inherently bad from a code design stand point for several reasons;
- We have no guarantee that the object pointed to by
W->objectptr
actually matches what is said inW->type
so the cast is inherently unsafe. - Every time we wish to add another type we must add another elseif statement and ensure
W->type
is set correctly.
It seems to be this would be much better solved with something like
class CObj
{
public:
virtual void doSomething(/* some params */)=0;
};
class SomeClassA : public CObj
{
public:
virtual void doSomething(/* some params */);
int someFieldA;
}
class SomeClassB : public CObj
{
public:
virtual void doSomething(/* some params */);
int someFieldB;
}
// sometime later...
for(int i=0; i<CObjList.size(); ++i)
{
CObj* W = CObjList[i];
W->doSomething(/* some params */);
}
This having been said there is the proviso that in this setting performace is important. This code will be called from a (relatively) tight loop.
My question is then; is the added complexity of a few vtable lookups outweighed by the improved code design and extensibility and is this likely to affect performace alot?
EDIT: It occurs to me that accessing the fields through a pointer in this way could be as bad as vtable lookups anyway due to cache misses etc. Any thoughts on this?
---- EDIT 2: Also I forgot to mention (and I know it's a bit off the original topic), inside the if statements are many calls to member functions of the surrounding class. How would you design the structure so as to be able to call these from inside doSomething()?
I'm going to answer specifically on the performance angle, because I work in a perf-critical environment and a while ago I happened to run measurements on a similar case to work out the fastest solution.
If you are on an x86, PPC, or ARM processor, you want virtual functions in this situation. The performance cost of calling a virtual function is mostly the pipeline bubble induced by mispredicting an indirect branch. Because the instruction fetch stage of the CPU can't know where the computed jmp goes, it can't start fetching bytes from the target address until the branch executes, and thus you have a stall in the pipeline corresponding to the number of stages between the first fetch stage and the branch retire. (On the PPC I know best, that's something like 25 cycles.)
You also have the latency of loading the vtable pointer, but this is often hidden by instruction reordering (the compiler moves the load
instruction so it starts several cycles before you actually need the result and the CPU does other work while the data cache sends you its electrons.)
With the if-cascade approach you instead have some number n of direct, conditional branches — where the target is known at compile time, but whether the jump is taken is determined at runtime. (ie, a jump-on-equal opcode.) In this case the CPU will make a guess (predict) at whether each branch is taken or not, and start fetching instructions accordingly. So, you will only have a bubble if the CPU guesses wrong. Since you are presumably calling this function with different input each time, it's going to mispredict at least one of these branches, and you'll have the exact same bubble that you would with virtuals. In fact, you'll have a whole lot more bubbles — one per if() conditional!
With virtual functions, there's also the risk of an additional data cache miss on loading the vtable, and an icache miss on the jump target. If this function is in a tight loop, then presumably you'll be looking up and calling the same subroutines a lot, and thus the vtable and function bodies will probably still be in cache. You could measure that if you wanted to be really sure.
Use virtual functions, this hypothetical optimization means nothing. What matters is code readability, maintainability and quality.
Optimize later with the aid of a profiler if you really need to tune hot spots. Making your code unmaintainable with that kind of crap is a road to failure.
Also, virtual functions will help you do unit tests, mock interfaces, etc. Programming is about managing complexity....
My question is then; is the added complexity of a few vtable lookups outweighed by the improved code design and extensibility and is this likely to affect performace alot?
C++ compilers should be able to implement virtual functions very efficiently, so I don't think there's a downside in using them. (And certainly a huge maintainability/readability benefit!) But you should measure to make sure.
The way they are typically implemented is that each object has a vtable pointer. (multiple pointers in case of multiple inheritance, but let's forget that for now) This has the following relative costs over non-virtual functions.
- data space: one pointer per object
- data space: one vtable per class (not per object!)
- time: worstcase = two memory reads per function call (1 to get the vtable address, 1 to get the function address within the vtable). The offset in the vtable is known at compile time, because you know which function you're calling. There's no extra jumps.
Compare this with the costs of the non-OOP approach your existing software has.
- data space: one type ID per object
- code space: one if/else tree or switch statement each time you wish to call a function dependent on the object type
- time: having to evaluate the if/else tree or switch statement.
I'd vote for the virtual function approach as actually being faster than the non-OOP approach, because it eliminates the need to take the time and figure out what type of object it is.
I had some experience with some largish (1M+ line I think) scientific computation code that was using a similar type based switch construct. They refactored to a properly polymorphic based approach and got a significant speedup. Exactly the opposite of what they expected!
Turned out the compiler was better able to optimise some things in that structure.
However this was a long time ago (8 years or so) .. so who knows what modern compilers will do. Don't guess - profile it.
As piotr says the right answer is probably virtual functions. You'll have to test.
But to address your concern about the casts:
- Never use C-style casts in a C++ program use static_cast<>, dynamic_cast<> etc..
- In your specific case, use dynamic_cast<>. At least then you will get an exception if the types are not properly related, which better than a wild crash.
CRTP would be a great idea for such kind of cases.
Edit: In your case,
template<class T>
class CObj
{
public:
void doSomething(/* some params */)
{
static_cast<T*>(this)->doSomething(...);
}
};
class SomeClassA : public CObj<SomeClassA>
{
public:
void doSomething(/* some params */);
int someFieldA;
};
class SomeClassB : public CObj<<SomeClassB>
{
public:
void doSomething(/* some params */);
int someFieldB;
};
Now you may have to choose your loop code in different way to accommodate all objects of different CObj<T>
type.
精彩评论