C++, most efficient way to modulate sign (+/-) in an expression
I have an expression
x += y;
开发者_C百科
and, based on a boolean, I want to be able to change it to
x -= y;
Of course I could do
if(i){x+=y;} else{x-=y;}
//or
x+=(y*sign); //where sign is either 1 or -1
But if I have to do this iteratively, I want to avoid the extra computation. Is there a more efficient way? Is it possible to modulate the operator?
if (i) {x += y;} else {x -= y;}
is probably going to be as efficient as anything else you can do. y * sign
is likely to be fairly expensive (unless the compiler can figure out that y
is guaranteed to be 1 or -1).
The most efficient way to do this iteratively is to precompute the data you need.
So, precomputation:
const YourNumberType increment = (i? y : -y);
Then in your loop:
x += increment;
EDIT: re question in commentary about how to generate code, like this:
#include <stdio.h>
void display( int x ) { printf( "%d\n", x ); }
template< bool isSomething >
inline void advance( int& x, int y );
template<> inline void advance<true>( int& x, int y ) { x += y; }
template<> inline void advance<false>( int& x, int y ) { x -= y; }
template< bool isSomething >
void myFunc()
{
int x = 314;
int y = 271;
for( ;; )
{
advance< isSomething >( x, y ); // The nano-optimization.
display( x );
if( !( -10000 < x && x < 10000 ) ) { return; }
}
}
int main( int n, char*[] )
{
n > 1? myFunc<true>() : myFunc<false>();
}
E.g. with Visual C++ 10.0 this generates two versions of myFunc
, one with an add
instruction and the other with a sub
instruction.
Cheers & hth.,
On a modern pipelined machine you want to avoid branching if at all possible in those cases where performance really does count. When the front of the pipeline hits a branch, the CPU guesses which branch to take and lets the pipeline work ahead based on that guess. Everything is fine if the guess was right. Everything is not so fine if the guess was wrong, particularly so if you're still using one of Intel's processors such as a Pentium 4 that suffered from pipeline bloat. Intel discovered that too much pipelining is not a good thing.
More modern processors still do use pipelining (the Core line has a pipeline length of 14 or so), so avoiding branching still remains one of those good things to do -- when it counts, that is. Don't make your code an ugly, prematurely optimized mess when it doesn't count.
The best thing to do is to first find out where your performance demons lie. It is not at all uncommon for a tiny fraction of one percent of the code base to be the cause of almost all of the CPU usage. Optimizing the 99.9% of the code that doesn't contribute to the CPU usage won't solve your performance problems but it will have a deleterious effect on maintenance.
You optimize once you have found the culprit code, and even then, maybe not. When performance doesn't matter, don't optimize. Performance as a metric runs counter to almost every other code quality metric out there.
So, getting off the soap box, let's suppose that little snippet of code is the performance culprit. Try both approaches and test. Try a third approach you haven't thought of yet and test. Sometimes the code that is the best performance-wise is surprisingly non-intuitive. Think Duff's device.
If i
stays constant during the execution of the loop, and y
doesn't, move the if
outside of the loop.
So instead of...
your_loop {
y = ...;
if (i)
x += y;
else
x -= y;
}
...do the following....
if (i) {
your_loop {
y = ...;
x += y;
}
}
else {
your_loop {
y = ...;
x -= y;
}
}
BTW, a decent compiler will do that optimization for you, so you may not see the difference when actually benchmarking.
Sounds like you want to avoid branching and multiplication. Let's say the switch i
is set to all 1
bits, same size as y
, when you want to add, and to 0
when you want to subtract. Then:
x += (y & i) - (y & ~i)
Haven't tested it, this is just to give you the general idea. Bear in mind that this makes the code a lot harder to read in exchange for what would probably be a very small increase in efficiency.
Edit: or, as bdonlan points out in the comments, possibly even a decrease.
I put my suggestion in the comments to the test, and in a simple test bit-fiddling is faster than branching options on an Intel(R) Xeon(R) CPU L5520 @ 2.27GHz, but slower on my laptop Intel Core Duo.
If you are free to give i
either the value 0
(for +) or ~0
(for -), these statements are equivalent:
// branching:
if ( i ) sum -= add; else sum += add;
sum += i?-add:add;
sum += (i?-1:1)*add;
// bit fiddling:
sum += (add^i)+(i&1);
sum += (add^i)+(!!i);
sum += (i&~add)-(i&add);
And as said, one method can beat the other by a factor of 2, depending on CPU and optimization level used.
Conclusion, as always, is that benchmarking is the only way to find out which is faster in your particular situation.
精彩评论