C API design: Who should allocate? [closed]
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this questionWhat is the proper/preferred way to allocate memory in a C API?
I can see, at first, two options:
1) Let the caller do all the (outer) memory handling:
myStruct *s = malloc(sizeof(s));
myStruct_init(s);
myStruct_foo(s);
myStruct_destroy(s);
free(s);
The _init
and _destroy
functions are necessary since some more memory may be allocated inside, and it must be handled somewhere.
This has the disadvantage of being longer, but also the malloc can be 开发者_StackOverfloweliminated in some cases (e.g., it can be passed a stack-allocated struct:
int bar() {
myStruct s;
myStruct_init(&s);
myStruct_foo(&s);
myStruct_destroy(&s);
}
Also, it's necessary for the caller to know the size of the struct.
2) Hide malloc
s in _init
and free
s in _destroy
.
Advantages: shorter code, since the functions are going to be called anyway. Completely opaque structures.
Disadvantages: Can't be passed a struct allocated in a different way.
myStruct *s = myStruct_init();
myStruct_foo(s);
myStruct_destroy(foo);
I'm currently leaning for the first case; then again, I don't know about C API design.
Another disadvantage of #2 is that the caller doesn't have control over how things are allocated. This can be worked around by providing an API for the client to register his own allocation/deallocation functions (like SDL does), but even that may not be sufficiently fine-grained.
The disadvantage of #1 is that it doesn't work well when output buffers are not fixed-size (e.g. strings). At best, you will then need to provide another function to obtain the length of the buffer first so that the caller can allocate it. At worst, it is simply impossible to do so efficiently (i.e. computing length on a separate path is overly expensive over computing-and-copying in one go).
The advantage of #2 is that it allows you to expose your datatype strictly as an opaque pointer (i.e. declare the struct but don't define it, and use pointers consistently). Then you can change the definition of the struct as you see fit in future versions of your library, while clients remain compatible on binary level. With #1, you have to do it by requiring the client to specify the version inside the struct in some way (e.g. all those cbSize
fields in Win32 API), and then manually write code that can handle both older and newer versions of the struct to remain binary-compatible as your library evolves.
In general, if your structs are transparent data which will not change with future minor revision of the library, I'd go with #1. If it is a more or less complicated data object and you want full encapsulation to fool-proof it for future development, go with #2.
Method number 2 every time.
Why? because with method number 1 you have to leak implementation details to the caller. The caller has to know at least how big the struct is. You can't change the internal implementation of the object without recompiling any code that uses it.
Why not provide both, to get the best of both worlds?
Use _init and _terminate functions to use method #1 (or whatever naming you see fit).
Use additional _create and _destroy functions for the dynamic allocation. Since _init and _terminate already exist, it effectively boils down to:
myStruct *myStruct_create ()
{
myStruct *s = malloc(sizeof(*s));
if (s)
{
myStruct_init(s);
}
return (s);
}
void myStruct_destroy (myStruct *s)
{
myStruct_terminate(s);
free(s);
}
If you want it to be opaque, then make _init and _terminate static
and do not expose them in the API, only provide _create and _destroy. If you need other allocations, e.g. with a given callback, provide another set of functions for this, e.g. _createcalled, _destroycalled.
The important thing is to keep track of the allocations, but you have to do this anyway. You must always use the counterpart of the used allocator for deallocation.
My favourite example of a well-design C API is GTK+ which uses method #2 that you describe.
Although another advantage of your method #1 is not just that you could allocate the object on the stack, but also that you could reuse the same instance multiple times. If that's not going to be a common use case, then the simplicity of #2 is probably an advantage.
Of course, that's just my opinion :)
Both are functionally equivalent. But, in my opinion, method #2 is easier to use. A few reasons for prefering 2 over 1 are:
It is more intuitive. Why should I have to call
free
on the object after I have (apparently) destroyed it usingmyStruct_Destroy
.Hides details of
myStruct
from user. He does not have to worry about it's size, etc.In method #2,
myStruct_init
does not have to worry about the initial state of the object.You don't have to worry about memory leaks from user forgetting to call
free
.
If your API implementation is being shipped as a separate shared library however, method #2 is a must. To isolate your module from any mismatch in implementations of malloc
/new
and free
/delete
across compiler versions you should keep memory allocation and de-allocation to yourself. Note, this is more true of C++ than of C.
The problem I have with the first method is not so much that it is longer for the caller, it's that the api now is handcuffed on being able to expand the amount of memory it is using precisely because it doesn't know how the memory it received was alloced. The caller doesn't always know ahead of time how much memory it will need (imagine if you were trying to implement a vector).
Another option you didn't mention, which is going to be overkill most of the time, is to pass in a function pointer that the api uses as an allocator. This doesn't allow you to use the stack, but does allow you to do something like replace the use of malloc with a memory pool, which still keeping the api in control of when it wants to allocate.
As for which method is proper api design, it's done both ways in the C standard library. strdup() and stdio uses the second method while sprintf and strcat use the first method. Personally I prefer the second method (or third) unless 1) I know I will never need to realloc and 2) I expect the lifetime of my objects to be short and thus using the stack is very convienent
edit: There is actually 1 other option, and it is a bad one with a prominent precedent. You could do it the way strtok() does it with statics. Not good, just mentioned for completeness sake.
Both ways are ok, I tend to do the first way as a lot of the C I do is for embedded systems and all the memory is either tiny variables on the stack or statically allocated. This way there can be no running out of memory, either you have enough at the beginning or you're screwed from the start. Good to know when you have 2K of Ram :-) So all my libraries are like #1 where the memory is assumed to be allocated.
But this is an edge case of C development.
Having said that, I'd probablly go with #1 still. Perhaps using init and finalize/dispose (rather than destroy) for names.
That could give some element of reflexion:
case #1 mimick the memory allocation scheme of C++, with more or less the same benefits :
- easy allocation of temporaries on stack (or in static arrays or such to write you own struct allocator replacing malloc).
- easy free of memory if anything goes wrong in init
case #2 hides more informations on used structure and can also be used for opaque structures, typically when structure as seen by user is not exactly the same as internally used by the lib (say there could be some more fields hidden at the end of structure).
Mixed API between case#1 and case #2 is also common : there is a field used to pass in a pointer to some already initialized structure, if it is null it is allocated (and pointer is always returned). With such API the free is usually responsibility of caller even if init performed allocation.
In most cases I would probably go for case #1.
Both are acceptable - there's tradeoffs between them, as you've noted.
There's large real world examples of both - as Dean Harding says, GTK+ uses the second method; OpenSSL is an example that uses the first.
I would go for (1) with one simple extension, that is to have your _init
function always return the pointer to the object. Your pointer initialization then may just read:
myStruct *s = myStruct_init(malloc(sizeof(myStruct)));
As you can see the right hand side then only has a reference to the type and not to the variable anymore. A simple macro then gives you (2) at least partially
#define NEW(T) (T ## _init(malloc(sizeof(T))))
and your pointer initialization reads
myStruct *s = NEW(myStruct);
See your method #2 says
myStruct *s = myStruct_init();
myStruct_foo(s);
myStruct_destroy(s);
Now see if myStruct_init()
needs return some error code for various reason then lets go this way.
myStruct *s;
int ret = myStruct_init(&s); // int myStruct_init(myStruct **s);
myStruct_foo(s);
myStruct_destroy(s);
精彩评论