Application crash with no explanation

2023-02-15 13:11 问答作者：

I'd like to apologize in advance, because this is not a very good question.

I have a server application that runs as a service on a dedicated Windows server. Very randomly, this application crashes and leaves no hint as to what caused the crash.

When it crashes, the event logs have an entry stating that the application failed, but gives no clue as to why. It also gives some information on the faulting module, but it doesn't seem very reliable, as the faulting module is usually different on each crash. For example, the latest said it was ntdll, the one before that said it was libmysql, the one before that said it was netsomething, and so on.

Every single thread in the application is wrapped in a try/catch (...) (anything thrown from an exception handler/not specifically caught), __try/__except (structured exceptions), and try/catch (specific C++ exceptions). The application is compiled with /EHa, so the catch all will also catch structured exceptions.

All of these exception handlers do the same thing. First, a crash dump is created. Second, an entry is logged to a new file on disk. Third, an entry is logged in the application logs. In the case of these crashes, none of this is happening. The bottom most exception handler (the try/catch (...)) does nothing, it just terminates the thread. The main application thread is asleep and has no chance of throwing an exception.

The application log files just stop logging. Shortly after, the process that monitors the server notices that it's no longer responding, sends an alert, and starts it again. If the server monitor notices that the server is still running, but just not responding, it takes a dump of the process and reports this开发者_Go百科, but this isn't happening.

The only other reason for this behavior that I can come up with, aside from uncaught exceptions, is a call to exit or similar. Searching the code brings up no calls to any functions that could terminate the process. I've also made sure that the program isn't terminating normally (i.e. a stop request from the service manager).

We have tried running it with windbg attached (no chance to use Visual Studio, the overhead is too high), but it didn't report anything when the crash occurred.

What can cause an application to crash like this? We're beginning to run out of options and consider that it might be a hardware failure, but that seems a bit unlikely to me.

If your app is evaporating an not generating a dump file, then it is likely that an exception is being generated which your app doesnt (or cant) handle. This could happen in two instances:

1) A top-level exception is generated and there is no matching catch block for that exception type.

2) You have a matching catch block (such as catch(...)), but you are generating an exception within that handler. When this happens, Windows will rip the bones from your program. Your app will simply cease to exist. No dump will be generated, and virtually nothing will be logged, This is Windows' last-ditch effort to keep a rogue program from taking down the entire system.

A note about catch(...). This is patently Evil. There should (almost) never be a catch(...) in production code. People who write catch(...) generally argue one of two things:

"My program should never crash. If anything happens, I want to recover from the exception and continue running. This is a server application! ZOMG!"

-or-

"My program might crash, but if it does I want to create a dump file on the way down."

The former is a naive and dangerous attitude because if you do try to handle and recover from every single exception, you are going to do something bad to your operating footprint. Maybe you'll munch the heap, keep resources open that should be closed, create deadlocks or race conditions, who knows. Your program will suffer from a fatal crash eventually. But by that time the call stack will bear no resemblance to what caused the actual problem, and no dump file will ever help you.

The latter is a noble & robust approach, but the implementation of it is much more difficult that it might seem, and it fraught with peril. The problem is you have to avoid generating any further exceptions in your exception handler, and your machine is already in a very wobbly state. Operations which are normally perfectly safe are suddenly hand grenades. new, delete, any CRT functions, string formatting, even stack-based allocations as simple as char buf[256] could make your application go >POOF< and be gone. You have to assume the stack and the heap both lie in ruins. No allocation is safe.

Moreover, there are exceptions that can occur that a catch block simply can't catch, such as SEH exceptions. For that reason, I always write an unhandled-exception handler, and register it with Windows, via SetUnhandledExceptionFilter. Within my exception handler, I allocate every single byte I need via static allocation, before the program even starts up. The best (most robust) thing to do within this handler is to trigger a seperate application to start up, which will generate a MiniDump file from outside of your application. However, you can generate the MiniDump from within the handler itself if you are extremely careful no not call any CRT function directly or indirectly. Basically, if it isn't an API function you're calling, it probably isn't safe.

I've seen crashes like these happen as a result of memory corruption. Have you run your app under a memory debugger like Purify to see if that sheds some light on potential problem areas?

Analyze memory in a signal handler

http://msdn.microsoft.com/en-us/library/xdkz3x12%28v=VS.100%29.aspx

This isn't a very good answer, but hopefully it might help you.

I ran into those symptoms once, and after spending some painful hours chasing the cause, I found out a funny thing about Windows (from MSDN):

Dereferencing potentially invalid pointers can disable stack expansion in other threads. A thread exhausting its stack, when stack expansion has been disabled, results in the immediate termination of the parent process, with no pop-up error window or diagnostic information.

As it turns out, due to some mis-designed data sharing between threads, one of my threads would end up dereferencing more or less random pointers - and of course it hit the area just around the stack top sometimes. Tracking down those pointers was heaps of fun.

There's some technincal background in Raymond Chen's IsBadXxxPtr should really be called CrashProgramRandomly

Late response, but maybe it helps someone: every Windows app has a limit on how many handles can have open at any time. We had a service not releasing a handle in some situation, the service would just disappear, after a few days, or at times weeks (depending on the usage of the service). Finding the leak was great fun :D (use Task Manager to see thread count, handles count, GDI objects, etc)

继续阅读：crash

Application crash with no explanation

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？