When should a C function return newly allocated memory?
In a response elsewhere, I found the following snippet:
In general it is nicer in C to have the caller allocate memory, not the callee - hence why strcpy is a "nicer" function, in my opinion, than strdup.
I can see how this is a valid pattern, but why might it be considered nicer? Are there advantages to following this pattern? Or not?
example
Recently I've written a fair amount of code that looks something like:
struct foo *a = foo_create();
// do something with a
foo_destroy(a);
If foo
is a anything more than a flat structure, then I figured I could put all my initialization in one step. Also, assume the stru开发者_JAVA技巧ct should be on the heap. Why might it be better form to do something like:
struct foo *a = malloc(sizeof(foo));
foo_init(a);
// do something with a
foo_destroy(a)
Whenever you want an opaque structure and don't want to expose its internals in the header file. Your foo_create()
example illustrates this.
Another example is the Windows API. E.g. CreateWindow
gives you a HWND
. You have no idea what the actual WND
structure looks like and can't touch its fields.
Same with kernel object handles. E.g. CreateEvent
gives a HANDLE
. You can only manipulate it with the well-defined API, and close it with CloseHandle()
.
Re:
struct foo *a = malloc(sizeof(foo));
This requires you to define struct foo
in a header, and hence expose its internals. If you want to change it down the track, you risk breaking existing code that (incorrectly) relied on its members directly.
The main advantage of having the caller allocate the memory is that it simplifies the interface, and it's completely unambiguous that the caller owns the memory. As your create/destroy example shows, the simplification is not very great.
I prefer the create/destroy convention established by Dave Hanson in C Interfaces and Implementations:
struct foo *foo_new(...); // returns result of malloc()
void foo_free(struct foo **foop); // free *foop's resources and set *foop = NULL
You follow the convention thus:
struct foo *a = foo_new();
...
foo_free(&a);
// now `a` is guaranteed to be NULL
This convention makes it a little less likely you will leave a dangling pointer.
Either approach you post is good form; the former is closer to how C++ handles things, the latter is more Objective-C like. The important thing is to balance creation and destruction within a code block. This practice falls under the category of reducing coupling. What's bad practice is to have a function that creates something and performs additional tasks, as strdup
does, which makes it harder to know if the caller has to dispose of anything without consulting the documentation.
Both approaches are perfectly fine. Consider all the FILE* manipulation functions, they're not allowing you to allocate a FILE yourself.
If you're programming in C, you'd often want full control of everything. That means giving the caller the control over where and how to allocate the structure is a good thing. Personally I usually create 2 functions for initalization if I'm not needing an opaque struct
int foo_init(struct foo *f); // allows the caller to allocate 'f'
//however is suitable
struct foo * new_foo(void); // mallocs and calls foo_init, for convenience.
And if needed, corresponding
void foo_free(struct foo *f ); //frees and destroys 'f'
void foo_destroy(struct foo *f); //cleans up whatever internal stuff 'f' has,
// but does not free 'f' itself
In cases where you want the caller to treat the structure as opaque, you'll only provide a struct foo* new_foo(void); Not exposing the struct foo implementation has some benefits:
- Caller arn't allowed to poke around or perform potential dangerous shortcuts by accessing the members directly.
- You can change struct foo without breaking existing binaries (you're not breaking the ABI), can be a big deal if you're implementing a library.
- Your public header file don't need to expose the implementation and other required headers for struct foo
The drawbacks
- The caller has no control over the allocation of a struct foo
- You'll have the overhead of always having to manipulate a struct foo through function calls
By allocating and deallocating memory in the same function (or source file), you'll have an easier time identifying potential memory leaks (or convincing yourself there are none) without having to hop around to different places in your program. If a callee allocates memory, it's ambiguous where the deallocation should take place. In contrast, by having the caller do it, that code is taking "full responsibility" for the memory.
Above all, though, consistency is the key. Pick one approach and stick with it.
My opinion on it is this - there are two ways to deal with this:
If you write a function that allocates memory, write a comment above the function to indicate that the responsibility of memory management lies with the programmer i.e. explicitly free the memory, hence passing the burden of memory management to the programmer who will be responsible.
Alternatively, write a wrapper function something like this, ending in _alloc
and a corresponding wrapper function ending in _free
, in that way, you are defining a well-documented set of routines which makes it easier for the programmer to read.
The simple advantage is this: if the programmer un-intentionally introduced a memory leak, the warning is there as the adage in C is this 'For every malloc, there should be a corresponding free, if you do not have it, then you have a leak'. The programmer in turn can clue in and say "Aha..I called this wrapper function something_alloc
but did not call something_free
". You get the gist? And anyway, the programmer will thank you for it!
Really, it is down to how well the code API is defined. If you want to write code to manage memory and hence free up the programmer from having the responsibility in managing memory, best to wrap it and give it a special meaning, as I have suggested like using an underscore followed by 'alloc' and 'free'.
This will earn you kudos and respect as the programmer who will be reading and using your code will say - 'Thanks bud' and end result is everyone will be happy.
Nicer as in Friendlier. Let the caller allocate the memory and the caller can decide HOW to allocate the memory. They can choose between stack and heap obviously. But also between multiple heaps in some cases. They can package up mulitiple allocations into a single malloc call (useful when you need to marshal the data into another address space).
In Win32, there is GlobalAlloc()
, which is the ONLY way to allocate memory that can be passed in DDE messages to other applications. (not that anyone cares anymore ;)
in Win32 we also have VirtualAlloc
which is not used very often, but has some properties that make it invaluable for some special cases. (you can change memory from read-write to read-only after you initialize it).
There's also CreateFileMapping/MapViewOfFile
, which gets you memory that is backed by a particular file - writes to the memory end up writing to the file.
I'm sure Unix has equivalent specialized memory allocation functions.
It all comes down to establishing ownership of the memory.
When projects get very large, it can become difficult to figure out where all the memory is going.
In C++, we often get around this by using a factory such as the foo_create() example. That factory knows how to setup foo objects, and can easily track how much memory it is allocating and how much it is freeing.
While something similar can be done in C, often we simply make sure that each layer of your program cleans up the memory it uses. Thus, a reviewer can glance at the code to make sure that each malloc has a matching free. When allocations are too deeply nested, it can quickly become unclear where a memory leak occurs.
By the way, I tend to lean toward having an initializer that is separate from allocation for the sake of returning an error value from the initializer. If you simply call foo_create(), and get a null pointer back, it is not clear if the creation failed due to lack of memory, or due to some other reason. Getting into the habit of having return values on init functions can save you a lot of debugging time.
I prefer the GLib style (the 1st you mentioned). For me choosing this style makes it more object oriented like. Your methods take care of creating and destroying the struct, so you don't have to fight with the internals of a structure. This aproach leads your code to have less errors as well.
A GString example:
GString *s;
s = g_string_new();
// Not in this case, but sometimes you can
// find this:
if (s == NULL)
printf("Error creating the object!");
Having the caller allocate memory is better, because you can save memory allocations by recycling old data structures manually. This is useful in mathy applications, when you have lots of N sized arrays lying around. Remember, memory allocation is quite slow.
On the other hand, if the size of the array can only be determined by the function (i.e. the size of the result is unknown) then the callee should allocate.
Whatever you do, use conventions to tell people what has happened. Big stupid names like pre_allocated_N_array
or new_result_array
(sorry, I'm not a C expert, there should be C conventions for this though) are very handy for people who are using your function without reading the docs. It all comes down to consistency.
If you allocate your memory yourself, you have control over how you do that. Either in stack, standard malloc or one of the sixteen memory managers you use in your application.
If memory is allocated for you not only you have no control over how it's done, but you should be aware of how to free the memory. Well, most of the libraries would provide you a "free" function for free.
Having said that I still don't think there's one "nicer" approach. What suits your use better.
精彩评论