dynamic_cast overhead in C++

2023-04-03 08:06 问答作者：

I know that dynamic_cast have runtime check and therefor consider safer (can return null pointer on failure) but slower then static_cast. but how bad is the overhead between the two?

should I realy consider use static_cast in loops for performance issues in regular large projects? or the difference is minor and 开发者_如何转开发only relevant for special real-time programs.

Did you profile it?

The rule is:

Use static_cast when you know that the target type is valid.
Use dynamic_cast when you're not sure, and you need the program to look up the object's runtime type for you.

It's as simple as that. All other considerations are secondary.

Depends on how the dynamic cast does its class safety/correctness check. In systems I've profiled, it can turn into a very large amount of string compares very quickly. It's a big enough deal that we pretty much use an assert_cast style system where static cast is done for performance and dynamic is used for debug.

Extremely large C++ codebases (e.g. Mozilla, OpenOffice) have a habit of disabling RTTI (and therefore being unable to use dynamic_cast and exceptions) because the overhead of merely including RTTI data in the executable is seen to be unacceptable. Particularly, it is reported to cause a large (I remember numbers on the order of 10%) increase in startup time due to additional dynamic relocations.

Whether or not the additional code required to avoid dynamic_cast and exceptions is actually even slower is never discussed.

Tomalak Geret'kal is right, use static_cast when you know, dynamic_cast when you don't. If you want to avoid the cost, you have to structure your design in such a way that you DO know. Storing separate types in separate containers will make your loop logic more complex but you can fix that with template algorithms.

For simple inheritence trees it's pretty fast. If you are casting sideways in a complex hierarchy, with virtual inheritence, then it has to do a nontrivial search.

Examples:

struct Base {virtual ~Base () {}};
struct Foo : Base {};

struct Bar1 : virtual Base {};
struct Bar2 : virtual Base {};

struct Baz : Bar1, Bar2 {};

Base * a = new Foo ();
Bar1 * b = new Baz ();

dynamic_cast <Foo *> (a); // fast
dynamic_cast <Bar2 *> (b); // slow

The performance will depend a lot on the compiler. Measure, measure, measure! Bear in mind that run time type information is typically factored out and will be in non-local memory -- you should consider what the cache is going to do in loops.

I just tried out a small benchmark of casts (on my ~3 year old netbook, so the numbers are quite high, but well). This is the test setup:

class A {
  public:
    virtual ~A() {}
};

class B : public A {
};

#define IT(DO) \
    for (unsigned i(1<<30); i; i--) { \
      B* volatile b(DO); \
      (void)b; \
    }

#define CastTest(CAST) IT(CAST<B*>(a))
#define NullTest() IT(NULL)

int main(int argc, char** argv) {
  if (argc < 2) {
    return 1;
  }
  A* a(new B());
  switch (argv[1][0]) {
    case 'd':
      CastTest(dynamic_cast)
      break;
    case 's':
      CastTest(static_cast)
      break;
    default:
      NullTest()
      break;
  }
  return 0;
}

I found that it is highly dependent on the compiler optimisation, so here are my results:

(see Evaluation below)

O0:

g++ -O0 -Wall castbench.cpp; time ./a.out _; time ./a.out s; time ./a.out d

real        0m7.139s
user        0m6.112s
sys         0m0.044s

real        0m8.177s
user        0m6.980s
sys         0m0.024s

real        1m38.107s
user        1m23.929s
sys         0m0.188s

O1:

g++ -O1 -Wall castbench.cpp; time ./a.out _; time ./a.out s; time ./a.out d

real        0m4.412s
user        0m3.868s
sys         0m0.032s

real        0m4.653s
user        0m4.048s
sys         0m0.000s

real        1m33.508s
user        1m21.209s
sys         0m0.236s

O2:

g++ -O2 -Wall castbench.cpp; time ./a.out _; time ./a.out s; time ./a.out d

real        0m4.526s
user        0m3.960s
sys         0m0.044s

real        0m4.862s
user        0m4.120s
sys         0m0.004s

real        0m2.835s
user        0m2.548s
sys         0m0.008s

O3:

g++ -O3 -Wall castbench.cpp; time ./a.out _; time ./a.out s; time ./a.out d

real        0m4.896s
user        0m4.308s
sys         0m0.004s

real        0m5.032s
user        0m4.284s
sys         0m0.008s

real        0m4.828s
user        0m4.160s
sys         0m0.008s

Edit: Evaluation

For one cast (in the above test we had a total of 2**30 casts) we get the following times in the minimal example above:

-O0    71.66 ns
-O1    71.86 ns
-O2    -1.46 ns
-O3    -0.11 ns

The negative values are probably due to different loads at the moment where the program was executed and are small enough to be discarded as unsignificant (i.e. ==0). Since here there is no overhead, we have to assume that the compiler was smart enough to optimise the cast away, even although we said that b was volatile. Hence, the only reliable values are the 70 ns results.

继续阅读：casting

dynamic_cast overhead in C++

Edit: Evaluation

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Edit: Evaluation

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？