How much impact (time) can a single 'if' statement have within a tight loop?
I'm working on an application in C++ that uses a tight loop to iterate through states in our FSM. Right now, because of the tight loop, it uses 100% CPU, and our customers don't like that. I wanted to try putting a sleep(1)
in the tight loop to loosen it up, but we're worried that that will make it sleep too long between states for our large customers (whose states change very quickly!). I was thinking of trying somethin开发者_StackOverflow中文版g like this:
if(smallcustomer)
{
sleep(1);
}
And smallcustomer
would be defined somewhere else when the program started up. Does that 'if' statement just slow things down as much as sleep would, and defeating its own purpose?
Your implication is that the FSM shouldn't actually need 100% of the CPU, which leads me to assume you are spending a lot of time doing nothing except checking to see if you need to move to the next state or not. You say you are worried about sleeping "too long" for larger customers, which means you are concerned that some event will be missed: a queue filling, a mouse click, a packet being received, completion of disk I/O, a key being pressed, whatever. You should refactor to trigger asynchronously on that event (or events) instead of hogging the CPU doing nothing in a loop.
Short answer: I suspect that the simple if() statement won't hurt much. However, the golden rule of optimization is Test Test Test.
Longer answer: A somewhat cleaner (though more difficult) approach would be to move the time-consuming FSM processing to a separate / background thread, perhaps with lower scheduling priority. This might give you the best of both worlds: Fast processing when CPU is free, and less hogging of the system, due to lower priority.
Just my 2 cents...
A simple if statement like that should compile to a compare and branch instruction(s) (on some platforms, you'll get one; on others, you'll get two), which will execute very quickly.
Rather than forcing your app to sleep, see if your platform supports cooperative multitasking (so your process can yield execution) or a way of lowering the scheduling priority for your process.
If the platform supports it, usleep
will give you finer grain control than sleep
.
Assuming your FSM is working as designed, it ought to be possible to explain to the affected customers that idle CPUs are not indicative of a well-designed program. Any code that never hits a wait state is going to use 100% CPU during its time slice. That would be a selling point, in many cases.
What you are doing is going to decrease the performance of your app. Are you sure that's what you want?
If the FSM is looping excessively, or consistently starving other programs of CPU, that's a different discussion. Profiling the app to enable a target code review and modifications could help out in this case.
First, you shall avoid busy waits wherever possible. They are rarely justified outside of real-time embedded applications. Even then, if your controller does more than one thing in parallel, you're going to implement it with interrupts, if possible.
Whatever your reason, you can take that if out of the for loop as follows:
template<bool flag> void myf() {
for(;;) {
// your loop goes here
if(flag)
sleep(1);
}
}
void f() {
smallcustomer ? myf<true>() : myf<false>();
}
This way you force the compiler to instantiate two copies of myf
, and within each copy flag
is a compile-time constant. In both cases the compiler will eliminate the condition inside of the tight loop.
This is a well known compiler optimization called Loop unswitching. Most modern compilers can already do this automatically from your original code. However they need some form of a hint to determine that the performance gain justifies the duplicated machine code, that usually comes in a form of profile-guided optimization.
You should use a more specific synchronization tool than just sleep()- for example, waitable timers in Win32, although I don't have an equivalent for Linux/Mac. Sleep() style functions are quite notorious for being quite unreliable. What would be best is if you could allow the customer to alter the timer period.
Modern CPU's use a rather complex chain of conditions to guess what instruction will follow a conditional branch. Because the CPU decodes and process each instruction in parallel with many other instructions, the cost of getting guessing wrong can be disastrous. Just reordering the tests in the branch, or even the code that comes immediately before or after, can cause the prediction to change. Unfortunately, there's no easy way to predict what will work better.
For this reason, the only way to make informed decisions about optimizing CPU bound code (as you are describing) is to measure what the code is actually doing, making small changes, and measuring again to see if there was any improvement.
If you're really running a soft real-time application and it is using 100% cpu that probably doesn't mean that you should try to scale the usage back, but provide more cpu for it to use, because the input is outrunning the application's ability to keep up. In fact, scaling up or out is probably cheaper than improving the code's performance, server hardware is cheap compared to developer time.
If you are really worried about conditionals, try this suggestion for branch prediction in GCC:
#define likely(x) __builtin_expect((x),1)
#define unlikely(x) __builtin_expect((x),0)
if(likely(smallcustomer))
{
sleep(1);
}
精彩评论