Cost of a virtual function in a tight loop

2023-03-17 23:10 问答作者：

I am in a situation where I have game objects that have a virtual function Update(). There are a lot of game objects (currently a little over 7000) and the loop calls update for all of them (amongst other things). My colleague suggested that we should remove the virtual function altogether. As you can imagine, this will take a lot of refactoring.

I have seen this answer but in m开发者_运维百科y case, profiling means I have to change a lot of code. So before I even think of starting I thought I'd ask here for opinion on whether refactoring is worth it in this case.

Note that I have profiled other parts of the loop and have been trying to optimize the parts that are taking the longest. I suspect that the virtual function calls in this case is something I should not worry about, but I cannot be sure until I profile and I cannot profile until I change the code (which is a lot). Also note that some update functions are very small while others are larger more complex.

EDIT: There are multiple answers that give great insight, so anybody who stumbles onto this question in the future, have a look at all the answers and not just the selected one.

A virtual function call is not going to add much more than a single indirection and a hard-to-predict jump. That means that usually you're down one pipeline flush or about 20 cycles per virtual function. 7000 of them is about 140000 cycles, which should be negligible compared to your average update function. If it isn't, say that most of your update functions are just empty, you can consider putting the update-able objects in a separate list for this purpose.

Removing the virtual function is just going to lead to one of you replacing it with an identical but self-implemented system. This is the exact kind of place where a virtual function makes sense.

Per reference, 140000 cycles is about 50 microseconds. That's assuming a P4 with a huge pipeline and always a full pipeline flush (which you don't usually get).

Although it's not the same code and may not be the same compiler as you're using, here's a bit of reference data from a rather old benchmark (bench++ by Joe Orost):

Test Name:   F000005                         Class Name:  Style
CPU Time:        7.70  nanoseconds           plus or minus      0.385
Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
Test Description:
 Time to test a global using a 10-way if/else if statement
 compare this test with F000006


Test Name:   F000006                         Class Name:  Style
CPU Time:        2.00  nanoseconds           plus or minus     0.0999
Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
Test Description:
 Time to test a global using a 10-way switch statement
 compare this test with F000005


Test Name:   F000007                         Class Name:  Style
CPU Time:        3.41  nanoseconds           plus or minus      0.171
Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
Test Description:
 Time to test a global using a 10-way sparse switch statement
 compare this test with F000005 and F000006


Test Name:   F000008                         Class Name:  Style
CPU Time:        2.20  nanoseconds           plus or minus      0.110
Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
Test Description:
 Time to test a global using a 10-way virtual function class
 compare this test with F000006

This particular result is from compiling with the 64-bit edition of VC++ 9.0 (VS 2008), but it's reasonably similar to what I've seen from other recent compilers. The bottom line is that the virtual function is faster than most of the obvious alternatives, and very close to the same speed as the only one that beats it (in fact, the two being equal is within the measured margin of error). That, however, depends on the values involved being dense -- as you can see in F00007, if the values are sparse, the switch statement produces code that's slower than the virtual function call.

Bottom line: The virtual function call is probably the wrong place to look. Refactored code might easily work out slower, and even at best it probably won't gain enough to notice or care about.

If you can't profile, have a look at the assembler code to get an idea how expensive the lookup really is. It might be a simple indirect jump which costs almost nothing.

If you need to refactor, here is a suggestion: Create lots of "UpdateXxx" classes which know how to call the new non-virtual update() method. Collect those in an array and then call update() on them.

But my guess is that you won't save much, especially not with only 7K objects.

Note on profiling: If you can't use a profiler (makes me wonder why not), time the calls to update() and log calls which take longer than, say, 100ms. The timing isn't expensive and it allows you to quickly figure out which calls are most expensive.

another test with virtual, inline and direct calls you may find here [enter link description here][1] Virtual functions and performance - C++

继续阅读：optimization virtual-functions

Cost of a virtual function in a tight loop

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？