Inline function v. Macro in C -- What's the Overhead (Memory/Speed)?
I searched Stack Overflow for the pros/cons of function-like macros v. inline functions.
I found the following discussion: Pros and Cons of Different macro function / inline methods in C
...but it didn't answer my primary burning question.
Namely开发者_运维技巧, what is the overhead in c of using a macro function (with variables, possibly other function calls) v. an inline function, in terms of memory usage and execution speed?
Are there any compiler-dependent differences in overhead? I have both icc and gcc at my disposal.
My code snippet I'm modularizing is:
double AttractiveTerm = pow(SigmaSquared/RadialDistanceSquared,3);
double RepulsiveTerm = AttractiveTerm * AttractiveTerm;
EnergyContribution +=
4 * Epsilon * (RepulsiveTerm - AttractiveTerm);
My reason for turning it into an inline function/macro is so I can drop it into a c file and then conditionally compile other similar, but slightly different functions/macros.
e.g.:
double AttractiveTerm = pow(SigmaSquared/RadialDistanceSquared,3);
double RepulsiveTerm = pow(SigmaSquared/RadialDistanceSquared,9);
EnergyContribution +=
4 * Epsilon * (RepulsiveTerm - AttractiveTerm);
(note the difference in the second line...)
This function is a central one to my code and gets called thousands of times per step in my program and my program performs millions of steps. Thus I want to have the LEAST overhead possible, hence why I'm wasting time worrying about the overhead of inlining v. transforming the code into a macro.
Based on the prior discussion I already realize other pros/cons (type independence and resulting errors from that) of macros... but what I want to know most, and don't currently know is the PERFORMANCE.
I know some of you C veterans will have some great insight for me!!
Calling an inline function may or may not generate a function call, which typically incurs a very small amount of overhead. The exact situations under which an inline
function actually gets inlined vary depending on the compiler; most make a good-faith effort to inline small functions (at least when optimization is enabled), but there is no requirement that they do so (C99, §6.7.4):
Making a function an inline function suggests that calls to the function be as fast as possible. The extent to which such suggestions are effective is implementation-defined.
A macro is less likely to incur such overhead (though again, there is little to prevent a compiler from somehow doing something; the standard doesn't define what machine code programs must expand to, only the observable behavior of a compiled program).
Use whatever is cleaner. Profile. If it matters, do something different.
Also, what fizzer said; calls to pow (and division) are both typically more expensive than function-call overhead. Minimizing those is a good start:
double ratio = SigmaSquared/RadialDistanceSquared;
double AttractiveTerm = ratio*ratio*ratio;
EnergyContribution += 4 * Epsilon * AttractiveTerm * (AttractiveTerm - 1.0);
Is EnergyContribution
made up only of terms that look like this? If so, pull the 4 * Epsilon
out, and save two multiplies per iteration:
double ratio = SigmaSquared/RadialDistanceSquared;
double AttractiveTerm = ratio*ratio*ratio;
EnergyContribution += AttractiveTerm * (AttractiveTerm - 1.0);
// later, once you've done all of those terms...
EnergyContribution *= 4 * Epsilon;
An macro is not really a function. whatever you define as a macro gets verbatim posted into your code, before the compiler gets to see it, by the preprocessor. The preprocessor is just a software engineers tool that enables various abstractions to better structure your code.
A function inline or otherwise the compiler does know about, and can make decisions on what to do with it. A user supplined inline
keyword is just a suggestion and the compiler may over-ride it. It is this over-riding that in most cases would result in better code.
Another side effect of the compiler being aware of the functions is that you could potentially force the compiler to take certain decisions -for example, disabling inlining of your code, which could enable you to better debug or profile your code. There are probably many other use-cases that inline functions enable vs. macros.
Macros are extremely powerful though, and to back this up I would cite google test and google mock. There are many reasons to use macros :D.
Simple mathmatical operations that are chained together using functions are often inlined by the compiler, especially if the function is only called once in the translation step. So, I wouldn't be surprised that the compiler takes inlining decisions for you, regardless of weather the keyword is supplied or not.
However, if the compiler doesn't you can manually flatted out segments of your code. If you do flatten it out perhaps macros will serve as a good abstraction, after all they present similar semantics to a "real" function.
The Crux
So, do you want the compiler to be aware of certain logical boundaries so it can produce better physical code, or do you want force decisions on the compiler by flattening it out manually or by using macros. The industry leans towards the former.
I would lean towards using macros in this case, just because it's quick and dirty, without having to learn much more. However, as macros are a software engineering abstraction, and because you are concerned with the code the compiler generates, if the problem were to become slightly more advanced I would use C++ templates, as they were designed for the concerns you are pondering.
It's the calls to pow() you want to eliminate. This function takes general floating point exponents and is inefficient for raising to integral exponents. Replacing these calls with e.g.
inline double cube(double x)
{
return x * x * x;
}
is the only thing which will make a significant difference to your performance here.
Macros, including function-like macros, are simple text substitutions, and as such can bite you in the ass if you're not really careful with your parameters. For example, the ever-so-popular SQUARE macro:
#define SQUARE(x) ((x)*(x))
can be a disaster waiting to happen if you call it as SQUARE(i++)
. Also, function-like macros have no concept of scope, and don't support local variables; the most popular hack is something like
#define MACRO(S,R,E,C) \
do \
{ \
double AttractiveTerm = pow((S)/(R),3); \
double RepulsiveTerm = AttractiveTerm * AttractiveTerm; \
(C) = 4 * (E) * (RepulsiveTerm - AttractiveTerm); \
} while(0)
which, of course, makes it hard to assign a result like x = MACRO(a,b);
.
The best bet from a correctness and maintainability standpoint is to make it a function and specify inline
. Macros are not functions, and should not be confused with them.
Once you've done that, measure the performance and find where any actual bottleneck is before hacking at it (the call to pow
would certainly be a candidate for streamlining).
Please review the CERT Secure coding standard talking about macros and inline functions in terms of security and bug arousing , i do not encourage using function-like macros , because : - Less Profiling - Less Traceable - Harder to debug - Could Lead to severe Bugs
The best way to answer your question is to benchmark both approaches to see which is actually faster in your application, using your test data. Predictions about performance are notoriously unreliable except at the coarsest levels.
That said, I would expect there to be no significant difference between a macro and a truly inlined function call. In both cases, you should end up with the same assembly code under the hood.
If you random-pause this, what you're probably going to see is that 100% (minus epsilon) of the time is inside the pow
function, so how it got there makes basically no difference.
Assuming you find that, the first thing to do is get rid of the calls to pow
that you found on the stack.
(In general, what it does is take the log
of the first argument, multiply it by the second argument, and exp
of that, or something that does the same thing. The log
and exp
could well be done by some kind of series involving a lot of arithmetic. It looks for special cases, of course, but it's still going to take longer than you would.)
That alone should give you around an order of magnitude speedup.
Then do the random-pausing again. Now you're going to see something else taking a lot of the time. I can't guess what it will be, and neither can anyone else, but you can probably reduce that too. Just keep doing it until you can't any more.
It may happen along the way that you choose to use a macro, and it might be slightly faster than an inline function. That's for you to judge when you get there.
as others have said, it mostly depends on the compiler.
I bet "pow" costs you more than any inlining or macro will save you :)
I think its cleaner if its an inline function rather than a macro.
caching and pipelining are really where you are gonna get good gains if you are running this on a modern processor. ie. remove branching statements like 'if' make enormous differences ( can be done by a number of tricks )
As I understand it from some guys who write compilers, once you call a function from inside it is not very likely your code will be inlined anyway. But, that is why you should not use a macro. Macros remove information and leave the compiler with far fewer options to optimize. With multi-pass compilers and whole program optimizations they will know that inlining your code will cause a failed branch prediction or a cache miss or other black magic forces modern CPUs use to go fast. I think everyone is right to point out that the code above is not optimal anyway, so that is where the focus should be.
精彩评论