dynamic_cast overhead in C++
I know that dynamic_cast have runtime check and therefor consider safer (can return null pointer on failure) but slower then static_cast. but how bad is the overhead between the two?
should I realy consider use static_cast in loops for performance issues in regular large projects? or the difference is minor and 开发者_如何转开发only relevant for special real-time programs.
Did you profile it?
The rule is:
- Use
static_cast
when you know that the target type is valid. - Use
dynamic_cast
when you're not sure, and you need the program to look up the object's runtime type for you.
It's as simple as that. All other considerations are secondary.
Depends on how the dynamic cast does its class safety/correctness check. In systems I've profiled, it can turn into a very large amount of string compares very quickly. It's a big enough deal that we pretty much use an assert_cast style system where static cast is done for performance and dynamic is used for debug.
Extremely large C++ codebases (e.g. Mozilla, OpenOffice) have a habit of disabling RTTI (and therefore being unable to use dynamic_cast
and exceptions) because the overhead of merely including RTTI data in the executable is seen to be unacceptable. Particularly, it is reported to cause a large (I remember numbers on the order of 10%) increase in startup time due to additional dynamic relocations.
Whether or not the additional code required to avoid dynamic_cast
and exceptions is actually even slower is never discussed.
Tomalak Geret'kal is right, use static_cast
when you know, dynamic_cast
when you don't. If you want to avoid the cost, you have to structure your design in such a way that you DO know. Storing separate types in separate containers will make your loop logic more complex but you can fix that with template algorithms.
For simple inheritence trees it's pretty fast. If you are casting sideways in a complex hierarchy, with virtual inheritence, then it has to do a nontrivial search.
Examples:
struct Base {virtual ~Base () {}};
struct Foo : Base {};
struct Bar1 : virtual Base {};
struct Bar2 : virtual Base {};
struct Baz : Bar1, Bar2 {};
Base * a = new Foo ();
Bar1 * b = new Baz ();
dynamic_cast <Foo *> (a); // fast
dynamic_cast <Bar2 *> (b); // slow
The performance will depend a lot on the compiler. Measure, measure, measure! Bear in mind that run time type information is typically factored out and will be in non-local memory -- you should consider what the cache is going to do in loops.
I just tried out a small benchmark of casts (on my ~3 year old netbook, so the numbers are quite high, but well). This is the test setup:
class A {
public:
virtual ~A() {}
};
class B : public A {
};
#define IT(DO) \
for (unsigned i(1<<30); i; i--) { \
B* volatile b(DO); \
(void)b; \
}
#define CastTest(CAST) IT(CAST<B*>(a))
#define NullTest() IT(NULL)
int main(int argc, char** argv) {
if (argc < 2) {
return 1;
}
A* a(new B());
switch (argv[1][0]) {
case 'd':
CastTest(dynamic_cast)
break;
case 's':
CastTest(static_cast)
break;
default:
NullTest()
break;
}
return 0;
}
I found that it is highly dependent on the compiler optimisation, so here are my results:
(see Evaluation below)
O0:
g++ -O0 -Wall castbench.cpp; time ./a.out _; time ./a.out s; time ./a.out d
real 0m7.139s
user 0m6.112s
sys 0m0.044s
real 0m8.177s
user 0m6.980s
sys 0m0.024s
real 1m38.107s
user 1m23.929s
sys 0m0.188s
O1:
g++ -O1 -Wall castbench.cpp; time ./a.out _; time ./a.out s; time ./a.out d
real 0m4.412s
user 0m3.868s
sys 0m0.032s
real 0m4.653s
user 0m4.048s
sys 0m0.000s
real 1m33.508s
user 1m21.209s
sys 0m0.236s
O2:
g++ -O2 -Wall castbench.cpp; time ./a.out _; time ./a.out s; time ./a.out d
real 0m4.526s
user 0m3.960s
sys 0m0.044s
real 0m4.862s
user 0m4.120s
sys 0m0.004s
real 0m2.835s
user 0m2.548s
sys 0m0.008s
O3:
g++ -O3 -Wall castbench.cpp; time ./a.out _; time ./a.out s; time ./a.out d
real 0m4.896s
user 0m4.308s
sys 0m0.004s
real 0m5.032s
user 0m4.284s
sys 0m0.008s
real 0m4.828s
user 0m4.160s
sys 0m0.008s
Edit: Evaluation
For one cast (in the above test we had a total of 2**30
casts) we get the following times in the minimal example above:
-O0 71.66 ns
-O1 71.86 ns
-O2 -1.46 ns
-O3 -0.11 ns
The negative values are probably due to different loads at the moment where the program was executed and are small enough to be discarded as unsignificant (i.e. ==0). Since here there is no overhead, we have to assume that the compiler was smart enough to optimise the cast away, even although we said that b
was volatile. Hence, the only reliable values are the 70 ns results.
精彩评论