AppDomain, handling the exceptions
I am developing a large application which consists of many smaller plugins/applications.
They are not big enough to be a full process, but are too small to be run in a thread, under one process, along with that I want to have it based on a plugin-basis. If a newer version of that plugin is available it should be unloaded, updated and started again.
During my search for a solution I can accross the magic word AppDomain, and I quote:
"Use application domains to isolate tasks that might bring down a process. If the state of the AppDomain that's executing a task becomes unstable, the AppDomain can be unloaded without affecting the process. This is important when a process must run for long periods without restarting. You can also use application domains to isolate tasks that should not share data."
Thus that is exactly what I want. However, I guess their 'State becomes unstable' is a different point of view than mine. I am thinking of a problem where one of the plugins throws an exception, for whatever reason. I would like that be catched, e-mailed, unloaded and restart (if possible).
So I created an application that starts up, looks for all .dll's in its folder. Checks if the dll consists of a plugin. Creates a new AppDomain for that plugin, and once everything is loaded it will start each plugin. (Where each plugin can consist of multiple threads, co-existing happily next to ech other).
So I also added a time-out in there, that fires after 5seconds to throw a new Exception(); Added a UnhandledException event on the AppDomain to handle it. But, it catched it, and after cathing, still 'crashed' the whole process including all the extra child-AppDomains.
But it clearly states in the quote 'to isolate tasks that "might" bring down a process'. So a开发者_如何学JAVAm I missing something vital? Is my view on the quote wrong?
Since .NET 2.0 unhandled exceptions crash the process. From AppDomain.UnhandledException event documentation:
This event provides notification of uncaught exceptions. It allows the application to log information about the exception before the system default handler reports the exception to the user and terminates the application.
The same goes for AppDomain.FirstChanceException:
This event is only a notification. Handling this event does not handle the exception or affect subsequent exception handling in any way.
You need to think about how you will handle exceptions just like you will do it in normal app. Just using AppDomains will not help. If the exception has not been handled within given AppDomain it will get rethrown in calling AppDomain until it either get handled or crashes the process. It is perfectly fine to handle some exceptions and don't let them crash your process.
AppDomain is a logical container for assemblies and memory (not for threads). Isolation for AppDomain implies:
Objects created in domain A can not be accessed directly by domain B (without marshaling). This allows for domain A to be unloaded without affecting anything in domain B. These objects will get automatically deleted when 'owning' domain gets unloaded.
Assemblies can be automatically unloaded with AppDomain. This the only way you can unload managed dll from process. This is useful for DLL hot-swapping.
AppDomain security permissions and configuration can be isolated from other AppDomains. This can be helpful when you load untrusted third party code. It also lets you override how assemblies will be loaded (version binding, shadow copying etc).
Most common reasons for using AppDomain is when you run untrusted third party code. Or you have unmanaged code and want to host CLR or need dll hot swapping. I think that in CLR hosting scenario you can save your process from crashing when thirdparty code throws unhandled exception.
Also instead of rolling your own infrastructure you might want to look at System.Addin or MEF.
There are two problems with an unhandled exception. AppDomain solves only one of them. You're trying to deal with the other one.
Good news first. When you handle an exception, you have to restore the program state as though the exception never happened. Everything has to be rewound to the before-the-code-ran state. You normally have a bunch of catch and finally clauses that undo the state mutations performed by code. Nothing very simple of course. But entirely impossible if the exception is unhandled. You have no idea exactly what got mutated and how to restore it. AppDomain handles this very difficult problem with aplomb. You unload it and whatever state was left is just gone. No more garbage collected heap, no more loader heap (statics). The whole enchilada gets reset to whatever the state was before you create the AppDomain.
That's great. But there's another problem that's pretty hard to deal with as well. Your program was asked to perform a job. The thread set off to do that job. But it suffered a heart attack. Big problem number one: the thread is dead. That's pretty bad news if your program had only one thread to begin with. There's no thread left, the program terminates. Nice that the AppDomain unloaded first, but it really doesn't make any difference, it would have got unloaded anyway.
Big problem too: it was really rather important that this job got done. It didn't. That matters, the job was, say, to balance the corporate profit and loss statement. That didn't get done, somebody is going to have to take care of that because not balancing the statement is going to get a lot of people very upset.
How do you solve that?
There are only a few selected scenarios where that's acceptable. Server scenarios. Somebody asks it to do something, the server reports back "couldn't do it, please contact the system administrator". The way ASP.NET and SQL Server work. They use AppDomains to keep the server stable. And have system administrators to deal with the problems. You'll have to create that kind of support system to make AppDomains work for you.
Just adding some extra info on the subject for anybody that considers (been there myself) using application domains mainly to guarantee the stability of an application:
A few years ago, the System.AddIn
team published a very interesting blog entry. Using AppDomain Isolation to Detect Add-In Failures.
It explains that only out-of-process add-ins can guarantee the stability of the host. More specifically:
Starting with the CLR v2.0 unhandled exceptions on child threads will now cause the entire process to be torn down and thus it is impossible for a host to completely recover from this.
So what they suggest is to subscribe to the AppDomain.UnhandledException and before your application crashes, store somewhere (a log, a database etc.) information about who caused this exception. Then the next time your application starts use this information to protect your application. Perhaps you don't load the add-in or you inform the user and let her/him decide. (Microsoft Office applications followed this approach and disabled the plugins that crashed the host. You then had to re-enable them yourself.)
They also published another blog entry that shows how to do this even in scenarios where the host is running in another host (IIS, WAS etc.). More on Logging UnhandledExeptions from Managed Add-Ins.
Although both these articles are centered around System.AddIn
, they contain useful information for anyone trying to increase the stability of their plugin-aware application.
AppDomain is more oft used for being able to unload assemblies (like your suggesting) and for controlling startup parameters like .NET access levels, configurations, etc. If you really want 'isolation' then the best bet is always a worker process; however, it's a lot more work.
I do a fair amount of this in several projects. Just to give a broad-stroke picture, we use Google ProtoBuffers (Jon Skeet's port) over a managed Windows LRPC Library for most of the communications. For worker-process management we rely heavily on named events, I recently published an inter-process event library here just for this purpose.
精彩评论