Function inlining—what are examples where it hurt performance?
It's conventional wisdom that function inlining doesn't always benefit, and can even hurt performance:
- The Linux kernel style guide warns against excessive inlining
- Google also recommends programmers be careful with inlining
- The C++ FAQ lite says more of the same
I understand why inlining is supposed to help—it eliminates function call overhead by including the called function in its caller.
I also understand why people claim it can hurt performance—inlining functions can in some cases increase code size, which can eventually increas开发者_运维知识库e cache misses or even trigger extra page faults. This all makes sense.
I'm having trouble, though, finding specific examples where inlining actually hurts performance. Surely if it's enough of a problem to be worth warning about it, someone somewhere must have come across an example where inlining is a problem. So, I ask…
What is a good, concrete example of code where performance is actually hurt by function inlining?
On some platforms, with large inlined functions, performance can be reduced by causing a "far" jump rather than a relative jump. Inlining may also cause a page fault where the OS needs to haul in more code into memory, rather than executing code with may already exist (as a subroutine).
Some platforms may have optimized jump instructions for "near code". This type of jump uses a signed offset from the current position. The signed offsets may be restricted, for example 127 bytes. A long jump would require a bigger instruction because the longer jump must include the absolute address. Longer instructions take more time to execute.
Long inlined functions may expand the length of the executable so that the OS needs to haul in a new "page" into memory, called a page swap. Page swapping slows down execution speed of an application.
These are "possible" reasons how inlined code could slow performance. The real truth is obtained by profiling.
I had the case in our project in C (gcc). My collegue abused inlines in his library, forcing -fno-inline
reduced the CPU time by 10% (on SUN V890 with Ultrasparc IV+ processors).
Something not mentioned yet is that inlining of big functions into other big functions can cause excessive register spilling, hurting not only the the quality of the compiled code but also adding more overhead than was eliminated by the inline (and it max even screw up global and local optimization heurstics, iirc msdn has a warning about this under __forceinline
). Other 'constructs' such as inline non-naked asm put in inlines may produce unneeded stack frames, or inlines with special alignment requirements, or even those that just push the stack allocation into the range where the compiler shoves in stack checking allocation(_chkstk
under msvc).
I don't think inlining hurts performance other than indirectly relating to the code being larger, which I think you described.
In general, inlining improves performance by eliminating the call and return.
[In reference to inline functions]
The function is placed in the code, rather than being called, similar to using macros (conceptually)
This can improve speed (no function call), but causes code bloat (if the function is used 100 times, you now have 100 copies)
You should note this does not force the compiler to make the function inline, and it will ignore you if it thinks its a bad idea. Similarly the compiler may decided to make normal functions inline for you.
This also allows you to place the entire function in a header file, rather than implementing it in a cpp file (which you cant anyways, since then you get an unresolved external if it was declared inline, unless of course only that cpp file used it).
[Quote snagged from SO user 'Fire Lancer' so credit him]
I have no hard data to back this up, but in the case of the Linux kernel anyway (since the "The Linux kernel style guide" was cited in the question), code size could impact performance because the kernel code occupies physical memory regardless of instruction caching (kernel pages are never paged out).
Memory pages that are used by the kernel are permanently unavailable for user virtual memory. So if you're using memory pages for inlined code copied that have dubious benefit (the call overhead is generally small for functions that are large), you're having a negative impact on the system for no real benefit.
why do you need concrete examples of where inlining hurt performance? It is such a context sensitive issue. It depends on a number of hardware factors, including speed of RAM, CPU model, compiler version and a number of other factors. It's possible to create such an example on my computer, but which will still be faster than the non-inlined version no yours. And inlining, in turn, may enable dozens of other compiler optimizations that would not otherwise be performed. So even in a case where the code bloat causes a performance hit, it may enable some compilers to perform a number of other optimizations to compensate for it.
So you're not going to get a more meaningful answer than the theory, of why it may produce slower code.
If you need a specific example of where performance can be hurt by inlining, then go ahead and write it. It's not that difficult once you know the theory.
You want a function that is big enough to pollute the cache if inlined, and you want to call it from several different, but closely related, places (if you call it from two completely separate modules, then the two instantiations of the function won't compete for the cache space anyway. But if you alternate quickly between several different call sites, then each instantiation may force the previous one out of cache.
And of course, the function must be written so that little of it can get eliminated when it is inlined. If, upon inlining, the compiler is able to eliminate 80% of the code, then that'll mitigate the performance hit you might otherwise take.
And finally, you'll likely need to force it to be inlined. At best, compilers tend to treat the inline
keyword as a hint (sometimes not even that). So you'll likely have to look up compiler-specific ways to force a function to be inlined.
You may also want to disable other optimizations, as the compiler might otherwise be able to optimize the inlined version.
So it's pretty straightforward to produce slower code through inlining, once you know what to do. But it's quite a lot of work to do so, especially if you want anything near predictable or deterministic results. And despite your efforts, next year's compilers or next year's CPUs may again be able to outsmart you and produce faster code from your intentionally "over-inlined" code.
So I just don't see why you'd need to do this. Accept that excessive inlining can hurt in some cases, and understand why it can hurt. Beyond that, why bother?
A final point is that those warnings are often misguided, because there's very little to warn about. Because the compiler typically chooses by itself what to inline, and, at best, treats the inline
keyword as a hint, it generally doesn't matter whether or not you try to inline everything.
So while it is true that excessive inlining can hurt performance, excessive use of the inline
keyword usually doesn't.
The inline
keyword has other effects, which should guide its usage. Use it when you want to disable the One Definition Rule, to prevent linker errors when a function is defined in multiple translation units.
精彩评论