开发者

Overhead of calling tiny functions from a tight inner loop? [C++]

Say you see a loop like this one:

for(int i=0;
    i<thing.getParent().getObjectModel().getElements(SOME_TYPE).count();
    ++i)
{
  thing.getData().insert(
    thing.GetData().Count(),
    thing.getParent().getObjectModel().getElements(SOME_TYPE)[i].getName()
    );
}

if this was Java I'd probably not think twice. But in performance-critical sections of C++, it makes me want to tinker with it... however I don't know if the compiler is smart enough to make it futile. This is a made up example but all it's doing is inserting strings into a conta开发者_JAVA百科iner. Please don't assume any of these are STL types, think in general terms about the following:

  • Is having a messy condition in the for loop going to get evaluated each time, or only once?
  • If those get methods are simply returning references to member variables on the objects, will they be inlined away?
  • Would you expect custom [] operators to get optimized at all?

In other words is it worth the time (in performance only, not readability) to convert it to something like:

ElementContainer &source = 
   thing.getParent().getObjectModel().getElements(SOME_TYPE);
int num = source.count();
Store &destination = thing.getData();
for(int i=0;i<num;++i)
{
  destination.insert(thing.GetData().Count(), source[i].getName());
}

Remember, this is a tight loop, called millions of times a second. What I wonder is if all this will shave a couple of cycles per loop or something more substantial?


Yes I know the quote about "premature optimisation". And I know that profiling is important. But this is a more general question about modern compilers, Visual Studio in particular.


The general way to answer such questions is to looked at the produced assembly. With gcc, this involve replacing the -c flag with -S.

My own rule is not to fight the compiler. If something is to be inlined, then I make sure that the compiler has all the information needed to perform such an inline, and (possibly) I try to urge him to do so with an explicit inline keyword.

Also, inlining saves a few opcodes but makes the code grow, which, as far as L1 cache is concerned, can be very bad for performance.


All the questions you are asking are compiler-specific, so the only sensible answer is "it depends". If it is important to you, you should (as always) look at the code the compiler is emitting and do some timing experiments. Make sure your code is compiled with all optimisations turned on - this can make a big difference for things like operator[](), which is often implemented as an inline function, but which won't be inlined (in GCC at least) unless you turn on optimisation.


If the loop is that critical, I can only suggest that you look at the code generated. If the compiler is allowed to aggressively optimise the calls away then perhaps it will not be an issue. Sorry to say this but modern compilers can optimise incredibly well and the I really would suggest profiling to find the best solution in your particular case.


If the methods are small and can and will be inlined, then the compiler may do the same optimizations that you have done. So, look at the generated code and compare.

Edit: It is also important to mark const methods as const, e.g. in your example count() and getName() should be const to let the compiler know that these methods do not alter the contents of the given object.


As a rule, you should not have all that garbage in your "for condition" unless the result is going to be changing during your loop execution.

Use another variable set outside the loop. This will eliminate the WTF when reading the code, it will not negatively impact performance, and it will sidestep the question of how well the functions get optimized. If those calls are not optimized this will also result in performance increase.


I think in this case you are asking the compiler to do more than it legitimately can given the scope of compile-time information it has access to. So, in particular cases the messy condition may be optimized away, but really, the compiler has no particularly good way to know what kind of side effects you might have from that long chain of function calls. I would assume that breaking out the test would be faster unless I have benchmarking (or disassembly) that shows otherwise.

This is one of the cases where the JIT compiler has a big advantage over a C++ compiler. It can in principle optimize for the most common case seen at runtime and provide optimized bytecode for that (plus checks to make sure that one falls into that case). This sort of thing is used all the time in polymorphic method calls that turn out not to actually be used polymorphically; whether it could catch something as complex as your example, though, I'm not certain.

For what it's worth, if speed really mattered, I'd split it up in Java too.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜