Total system freezing when using timers in graphical application
I’m really stuck with this issue and will greatly appreciate any advice.
The problem: Some of our users complain about total system “freezing” when using our product. No matter how we tried, we couldn’t reproduce it in any of systems available for troubleshooting.
The product: Physically, it’s a 32bit/64bit DLL. The product has a self-refreshing GUI, which draws a realtime spectrogram of an audio signal
Problem details: What I managed to collect from a number of fragmentary reports makes the following picture:
When GIU is opened, sometimes immediately, sometimes after a few minutes of GIU being visible, the system completely stalls, without possibility to operate with windows, start Task Manager etc. No reactions on keyboard, no mouse cursor seen (or it’s seen but is not responsibe to mouse movements – this I do not know). The user has to hard-reset the system in order to reboot. What is important, I think, is that (in some cases) for some time the GIU is responsive and shows some adequate pictures. Then this freezing happens. One of the reports tells that once the system was frozen, the audio continued to be rendered – i.e. heard by the reporter (but the whole graphic shell of Windows was already frozen). Note: in this sort of apps it’s usually a specialized thread which is responsible for sound processing.
The freezing is more or less confirmed to happen for 2 users on Windows7 x64 using both 32 and 64 bit versions of the DLL, never heard of any other OSs mentioned with connection to this freezing (though there was 1 report without any OS specified).
That’s all that I managed to collect.
The architecture / suspicions:
I strongly suspect that it’s the GUI refreshing cycle that is a culprit.
Basically, it works like this:
- There is a timer that triggers callbacks at a frame rate of approx 25 fps.
- In this callback audio analysis is performed and GUI updated
Some details about the timer:
It’s based on this call:
CreateTimerQueueTimer(&m_timerHandle, NULL, xPlatformTimerCallbackWrapper,
this, m_firstExpInterval, m_period, WT_EXECUTEINTIMERTHREAD);
We create a timer and m_timerHandle
is called periodically.
Some details about the GUI refreshing:
It works like this:
HDC hdc = GetDC (hwnd);
// Some drawing
ReleaseDC(hwnd,hdc);
My intuition tells me that this CreateTimeQueueTimer
might be not the right decision. The reference page tells that in case of using WT_EXECUTEINTIMERTHREAD
:
The callback function is invoked by the timer thread itself. This flag should be used only for short tasks or it could affect other timer operations. The callback function is queued as an APC. It should not perform alertable wait operations.
I don’t remember why this WT_EXECUTEINTIMERTHREAD
option was chosen actually, now WT_EXECUTEDEFAULT
seems equally suitable for me.
In fact, I don’t see any major difference in using any of the options mentioned in the reference page.
Questions:
- Is anything of what was told give anyone any clue on what might be wrong?
- Have you faced similar problems, what was the reason?
Thanks for any info!
==========================================
Update: 2010-02-20
Unfortunatelly, the advise given here (which I could check so far) didn't help, namelly:
- changing to WT_EXECUTEDEFAULT in
CreateTimerQueueTimer(&m_timerHandle,NULL,xPlatformTimerCallbackWrapper,this,m_firstExpInterval,m_period, WT_EXECUTEDEFAULT);
- the reenterability guard was already there
- I havent' yet checked if updateding the GUI in WM_PAINT hander helps or not
Thanks for the hints anyway.
Now, I've been playing with this for a while, also got a real W7 intallation (I used to use the virtual one) and it seems that the problem can be narrowed down.
On my installation, using of the app really get the GUI far less responsive, although I couldn't manage to reproduce a total system freezing as someone reported.
My assumption now is this responsiveness degradation and reported total freezing have a common origin.
Then I did some primitive profiling and found that at least one of the culprits is BitBlt function that is called approx 50 times a second
BitB开发者_Python百科lt ((HDC)pContext->getSystemContext (), // hdcDest
destRect.left + pContext->offset.h,
destRect.top + pContext->offset.v,
destRect.right - destRect.left,
destRect.bottom - destRect.top,
(HDC)pSystemContext,
srcOffset.h,
srcOffset.v,
SRCCOPY);
The regions being copied are not really large (approx. 400x200 pixels). It is used for displaying the backbuffer and is executed in the timer callback.
If I comment out this BitBlt call, the problem seems to disappear (at least partly).
On the same machine running WinXP everything works just fine.
Any ideas on this?
Most likely what's happening is that your timer callback is taking more than 25 ms to execute. Then another timer tick comes along and it starts processing, too. And so on, and pretty soon you have a whole bunch of threads sucking down CPU cycles, all trying to do your audio analysis and in short order the system is so busy doing thread context switches that no real work gets done. And all the while, more and more timer ticks are getting placed into the queue.
I would strongly suggest that you use WT_EXECUTEDEFAULT
here, rather than WT_EXECUTEINTIMERTHREAD
. Also, you need to prevent overlapping timer callbacks. There are several ways to do that.
You can use a critical section in your timer callback. When the callback is triggered it calls TryEnterEnterCriticalSection
and if not successful, just returns without doing anything.
You can do something similar using a volatile
variable and InterlockedCompareExchange
.
Or, you can change your timer to be a one-shot (WT_EXECUTEONLYONCE
), and then re-set the timer at the end of every callback. That would make the thing execute 25 ms after the last one completed.
Which you choose is up to you. If your analysis often takes longer than 25 ms but not more than 35 ms, then you'll probably get a smoother update rate using WT_EXECUTEONLYONCE
. If it's rare that analysis takes more than 25 ms, or if it often takes more than about 35 ms (but less than 50 ms), then you're probably better off using one of the other techniques.
Of course, if it often takes longer than 25 ms, then you probably want to increase the time (reduce the update rate).
Also, as one of the commenters pointed out, it's possible that the problem also involves accessing the GUI from the timer thread. You should do all of your analysis in the timer thread, store the results somewhere that the main thread can access it, and then send a message to the window proc, telling it to update the display.
Have you asked the users to disable Aero/WDMDWM? With Aero enabled, rendering is implemented quite different. Without Aero, the behaviour will be similar to XP. Not that it solves anything, but it will give you a clue as to what the problem is.
精彩评论