How should a multi-threaded C application handle a failed malloc()?
A part of an application I'm working on is a simple pthread-based server that communicates over a TCP/IP socket. I am writing it in C because it's going to be running in a memory constrained environment. My question is: what sho开发者_运维百科uld the program do if one of the threads encounters a malloc() that returns NULL? Possibilities I've come up with so far:
- No special handling. Let malloc() return NULL and let it be dereferenced so that the whole thing segfaults.
- Exit immediately on a failed malloc(), by calling abort() or exit(-1). Assume that the environment will clean everything up.
- Jump out of the main event loop and attempt to pthread_join() all the threads, then shut down.
The first option is obviously the easiest, but seems very wrong. The second one also seems wrong since I don't know exactly what will happen. The third option seems tempting except for two issues: first, all of the threads need not be joined back to the main thread under normal circumstances and second, in order to complete the thread execution, most of the remaining threads will have to call malloc() again anyway.
What shall I do?
This is one of the reason that space / rad hard systems generally forbid dynamic memory allocation. When malloc()
fails, its extremely hard to 'cure' the failure. You do have some options:
- You are not required to use the built in libc
malloc()
(at all, or as usual). You can wrap malloc() to do extra work on failures, such as notifying something else. This is helpful when using something like a watchdog. You can also use a full blown garbage collector, though I don't recommend it. Its better to identify and fix leaks. - Depending on storage and complexity, infrequently accessed allocated blocks could be mapped to disk. But here, typically, you are only looking at a few KB of savings in physical memory.
- You can use a static pool of memory and your own
malloc()
that won't oversell it. If you have profiled your heap usage extensively (using a tool like Valgrind's massif or similar), you can reasonably size the pool.
However, what most of those suggestions boil down to is not trusting / using the system malloc()
if failure is not an option.
In your case, I think the best thing you can do is make sure a watchdog is notified in the event that malloc()
fails, so that your process (or the whole system) can be re-started. You don't want it looking 'alive and running' while in deadlock. This could be as simple as just unlinking a file.
Write very detailed logs. What file / line / function did the failure happen?
If malloc()
fails when trying to get just a few KB, its a good sign that your process really can't continue reliably anyway. If it fails grabbing a few hundred MB, you may be able to recover and keep going. By that token, whatever action you take should be based on just how much memory you were trying to get, and if calls to allocate a much smaller size still succeed.
The one thing you never want to do is just operate on NULL pointers and let it crash. Its just sloppy, provides no useful logging of where things went wrong and gives the impression that your software is of low / unstable quality.
There's nothing wrong with option 2. You don't have to assume - exit()
exits the process, which means all the threads are torn down and everything is cleaned up.
Don't forget to try and log where the failed allocation occured.
There's a fourth option: free some memory (caches are always good candidates) and try again.
If you cannot afford this, I'd choose option 2 (logging or printing some kind of error message, obviously)... The only concern about cleanup would be closing the opened network connections in an orderly manner, so the clients know that the application on the other side is shutting down rather than find an unexpected connectivity problem.
Depends on your architecture I think.
Does the malloc()
failing mean that just that thread can't continue or is the entire process borked in that circumstance?
Generally when memory is really tight (i.e. microprocessor environments) it is a good idea to avoid ALL dynamic memory allocation to avoid issues like this.
From personal experience, I can tell that the frequency of malloc failures is often overestimated. For instance, in Linux the usual "solution" is a variant of 2, and you don't get a malloc failure. A process just suddenly dies. On larger systems the application tends to die because a user or watchdog kills it, once the swapping has made it unresponsive.
This makes cleanup a bit harder, and it also makes it hard to come up with a general solution.
Is this running on an OS? The use of pthreads suggests so. Do you know even that malloc() will ever return NULL? On some systems (Linux for example) the fault will occur within malloc() and will be handled by the OS (by killing the process) without malloc() returning.
I would suggest that you allocate a memory pool at initialisation of your application and allocate from that rather than using malloc() after initialisation. This will give you control over the memory allocation algorithm and the behaviour when memory is exhausted. If there is insufficient memory for the pool, there will be a single point of failure at initialisation before your app has had a chance to start anything it cannot finish.
In real-time and embedded systems it is common to used a 'fixed-block memory allocator'. If your OS does not provide services, it can be implemented by pre-allocating memory blocks and placing their pointers on a queue. To allocate a block you take a pointer from the queue, and to release it you place it back on the queue. When the queue is empty, memory is exhausted, and you can either baulk and handle the error, or block and wait until another thread returns some memory. You may want to create multiple pools with different sized blocks, or even create a pool for a specific purpose with blocks the precise size needed for that purpose.
精彩评论