开发者

what type of bug causes a program to slowly use more processor power and all of a sudden go to 100%?

I was hoping to get some good ideas as to what might be causing a really nasty bug.

This is a program which is transmitting data over a socket, and also receives messages back. I could explain lots more, but I don't think this will help here.

I'm just searching for hypothetical problems which can cause the following behaviour:

  • program runs开发者_运维技巧
  • processor time slowly accumulates (till around 60%)
  • all of a sudden (could be after 30 but also after 60 seconds) the processor time shoots to 100%. the program halts completely
  • In my syslog it always ends on one line with a memory allocation (something similar to: myArray = new byte[16384]) in the same thread.

now here is the weird part: if I set the debugger anywhere...it immediately stops on that line. So just the act of setting a breakpoint, made the thread continue (it wasn't running since I saw no log output anymore)

I was thinking 'deadlock' but that would not cause 100% processor power. If anything, the opposite. Also, setting a breakpoint would not cause a deadlock to end.

anyone else a theoretical suggestion as to what kind of 'construct' might cause this effect? (apart from 'bad programming') ;^)

thanks

EDIT: I just noticed.... by setting the sendspeed slower, the problem shows itself much later than expected. I would think around the same amount of packets send...but no the amount of packets send is much higher this way before it has the same problem.


I can only guess, but the opposite of a deadlock would be a livelock. This means two threads who react to each other in an infinite loop. This could also be possibly interrupted by setting a break point, as livelocks generally depend on the right timing.

Other than this I had once a similar issue with the Java nio classes which are non-blocking which caused the main thread to busy wait for input. Although the CPU usage rose instantaneously, not just after a few seconds.

Maybe if you can provide a bit more information like the programming language or even a code sample there might be more ideas.


Anything that involves repetitive processing (looping, recursion, etc) can cause this.

What's interesting is that if the program is doing anything that normally slows down performance (such as disk IO or network access), then the processor is less likely to peg . The processor pegs at 100% only if the program is using the processor. If you have to wait for disk or network IO, then the processor thread has to wait.

So in the code, I'd check for loops where a lot of work is going on, but little IO.

Also, if you're debugging in Visual Studio, you can hit the pause button to stop the app at the current point and see what your code is doing when it locks.


I'm guessing an infinite loop in the socket receiving end. It keeps trying to allocate a buffer to receive the data that is coming in, but the buffer is never big enough so it keeps allocating. But it is really hard to say without code. I'd advise you to add more logging and/or single step the code if you don't want to share it.


Without seeing code, I only can say your program is probably infinite looping and the call that should block is not blocking correctly as you're expecting


You can also try profiling (EQUATEC free profiler, for example). If will show you how much of your processor time was spent in each method.


I found the answer... quite silly actually (it always is). The thread which is sending/receiving messages is doing this via asynchronous methods. However, the asynchronous callbacks never seem to be able to come through while the thread is also pumping messages in the sendqueue. I notice when I put a thread.sleep every second, all asynchronous callbacks are pumped through. So the solution it turns out is to have a separate thread for sending/receiving, done purely on async, and another one for filling the sendqueue.

why this would have resulted in 100% processor power is beyond me. But it does actually explain why setting a breakpoint allowed the async callbacks to catch up.


Because the program fails while allocating memory I would guess that the incoming message rate is too high for it to handle.

I imagine that your program has some thread that it's only job is to listen to the socket and send the incoming messages to some other threads to handle (maybe you have some thread pool there). Imagine a situation where the incoming message rate is too high so all the worker threads are busy handling previous messages and the thread that listen to the socket have to put the new messages into some kind of queue until one of the worker threads will be free to handle them. this queue will grow and grow until you won't have additional memory. so that could be the reason for your program's termination.

now, about the 100% CPU. I guess that the thread the uses the CPU must be one of the worker threads. this will explain why the listening thread is queuing the messages. the reason can be a corrupted message or something else that causes it to run into an infinite loop. "frenetisch applaudierend" suggested in his answer that two or more of the worker threads can cause "livelock" on each other which could also be the reason for your problem.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜