开发者

Java - calling static methods vs manual inlining - performance overhead

I am interested whether should I manually inline small methods which are called 100k - 1 million times in some performance-sensitive algorithm.

First, I thought that, by not inlining, I am incurring some overhead since JVM will have to find determine whether or not to inline this method (or even fail to do开发者_高级运维 so).

However, the other day, I replaced this manually inlined code with invocation of static methods and seen a performance boost. How is that possible? Does this suggest that there is actually no overhead and that by letting JVM inline at "its will" actually boosts performance? Or this hugely depends on the platform/architecture?

(The example in which a performance boost occurred was replacing array swapping (int t = a[i]; a[i] = a[j]; a[j] = t;) with a static method call swap(int[] a, int i, int j). Another example in which there was no performance difference was when I inlined a 10-liner method which was called 1000000 times.)


I have seen something similar. "Manual inlining" isn't necessarily faster, the result program can be too complex for optimizer to analyze.

In your example let's make some wild guesses. When you use the swap() method, JVM may be able to analyze the method body, and conclude that since i and j don't change, although there are 4 array accesses, only 2 range checks are needed instead of 4. Also the local variable t isn't necessary, JVM can use 2 registers to do the job, without involving r/w of t on stack.

Later, the body of swap() is inlined into the caller method. That is after the previous optimization, so the saves are still in place. It's even possible that caller method body has proved that i and j are always within range, so the 2 remaining range checks are also dropped.

Now in the manually inlined version, the optimizer has to analyze the whole program at once, there are too many variables and too many actions, it may fail to prove that it's safe to save range checks, or eliminate the local variable t. In the worst case this version may cost 6 more memory accesses to do the swap, which is a huge overhead. Even if there is only 1 extra memory read, it is still very noticeable.

Of course, we have no basis to believe that it's always better to do manual "outlining", i.e. extract small methods, wishfully thinking that it will help the optimizer.

--

What I've learned is that, forget manual micro optimizations. It's not that I don't care about micro performance improvements, it's not that I always trust JVM's optimization. It is that I have absolutely no idea what to do that does more good than bad. So I gave up.


The JVM can inline small methods very efficiently. The only benifit inlining yourself is if you can remove code i.e. simplify what it does by inlining it.

The JVM looks for certain structures and has some "hand coded" optimisations when it recognises those structures. By using a swap method, the JVM may recognise the structure and optimise it differently with a specific optimisation.

You might be interested to try the OpenJDK 7 debug version which has an option to print out the native code it generates.


Sorry for my late reply, but I just found this topic and it got my attention.

When developing in Java, try to write "simple and stupid" code. Reasons:

  1. the optimization is made at runtime (since the compilation itself is made at runtime). The compiler will figure out anyway what optimization to make, since it compiles not the source code you write, but the internal representation it uses (several AST -> VM code -> VM code ... -> native binary code transformations are made at runtime by the JVM compiler and the JVM interpreter)
  2. When optimizing the compiler uses some common programming patterns in deciding what to optimize; so help him help you! write a private static (maybe also final) method and it will figure out immediately that it can:
    • inline the method
    • compile it to native code

If the method is manually inlined, it's just part of another method which the compiler first tries to understand and see whether it's time to transform it into binary code or if it must wait a bit too understand the program flow. Also, depending on what the method does, several re-JIT'ings are possible during runtime => JVM produces optimum binary code only after a "warm up"... and maybe your program ended before the JVM warms itself up (because I expect that in the end the performance should be fairly similar).

Conclusion: it makes sense to optimize code in C/C++ (since the translation into binary is made statically), but the same optimizations usually don't make a difference in Java, where the compiler JITs byte code, not your source code. And btw, from what I've seen javac doesn't even bother to make optimizations :)


However, the other day, I replaced this manually inlined code with invocation of static methods and seen a performance boost. How is that possible?

Probably the JVM profiler sees the bottleneck more easily if it is in one place (a static method) than if it is implemented several times separately.


The Hotspot JIT compiler is capable of inlining a lot of things, especially in -server mode, although I don't know how you got an actual performance boost. (My guess would be that inlining is done by method invocation count and the method swapping the two values isn't called too often.)

By the way, if its performance really matters, you could try this for swapping two int values. (I'm not saying it will be faster, but it may be worth a punt.)

a[i] = a[i] ^ a[j];
a[j] = a[i] ^ a[j];
a[i] = a[i] ^ a[j];
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜