How to implement a platform-independent garbage collector?
Basically, I'm interested in writing a platform-independent garbage collector in C, probably using the mark-and-sweep algorithm or one of its common variants. Ideally, the interface would work along the following lines:
(1) gc_alloc()
allocates memory
(2) gc_realloc()
reallocates memory
(3) gc_run()
runs the garbage collector.
I've already taken a look at the libgc
garbage collection library developed by Boehm et. al., but it isn't platform-independent; it's just been ported to many different syste开发者_JAVA百科ms. I'd like to implement a garbage collector that contains no system-dependent code. Speed isn't a huge issue.
Any suggestions?
Unfortunately, it's not really possible to make a truly platform-independent garbage collector in C. A strict reading of the C standard allows any type (except unsigned char
) to have trap bits - bits which, when they have the wrong value, result in the system signalling an exception (ie, undefined behavior). When scanning allocated blocks for pointers, you have no way of determining whether a particular block of memory contains a legal pointer value, or if it'll trap as soon as you try to look at the value in it.
Examining pointers as ints doesn't help either - no int type is required to have representation compatible with a pointer. intptr_t
is only available on very recent compilers, and I don't believe its representation is required to be compatible either. And ints can have trap bits as well.
You also don't know the alignment requirements of pointers. On a platform where pointers have no alignment requirements (ie, can start at any byte) this means you need to stop at every byte, memcpy
to a suitable pointer type, and examine the result. Oh, and different pointer types can have different representations as well, which is also unavoidable.
But the bigger issue is finding the root set. Bohem GC and the others tend to scan the stack, as well as static data, for pointers that should go in the root set. This is impossible without knowledge of the OS's memory layout. So you will need to have the user explicitly mark members of the root set, which sort of defeats the point of a garbage collector.
So, in short, you can't make a GC in truly portable C. In principle you can if you make a few assumptions:
- Assume the root set will be explicitly given to you by the user.
- Assume there are no trap bits in pointer or int representations.
- Assume
intptr_t
is available or assume allvoid *
s are strictly ordered (ie,<
and>
work reasonably with pointers from differentmalloc
ations) - Assume all data pointer types have representation compatible with
void *
. - Optional, but gives a big speed boost: Hardcode the alignment of pointers (this is far from universal, and will need to be compiler- and platform-specific) This assumption will let you skip
memcpy
ing pointers to a known-aligned location, and will also cut down in the number of potential pointers to examine.
If you make these assumptions you should be able to make a conservative mark-sweep allocator. Use a binary tree to hold information on where allocations are, and scan over every possible aligned pointer location in allocated blocks for pointers. However, the need to explicitly provide the root set will make this all pointless - it'll be malloc
and free
all over again, except that for a certain ill-defined set of objects you can skip it. Not exactly what GC is supposed to provide, but I suppose it might have its place as, eg, part of a virtual machine (in which case the root set would be derived from information available to the virtual machine).
Note that this all applies only to conservative GCs - that is, ones which work blindly, scanning for pointers in data without knowing where it could be. If you're working on a VM, it's a lot easier - you can build a unified data type for all allocations by the VM that explicitly lists where pointers can be found. With this plus an explicit root set, you can build a non-conservative GC; this should be sufficient for building a VM or an interpreter.
For a mark-and-sweep algorithm, all you really need to do is calculate which objects are reachable from the root set, right? (It was a while ago since I dug into this...)
This could be managed by a separate object graph for GC-managed objects, and "all" you would need to do is add functions to properly manage that graph whenever you allocate or modify managed objects. If you also add reference counting for managed objects you would make it easier to calculate which ones are reachable directly from stack references.
This should probably be possible to write fairly platform-independent, though it might be arguable whether this would be a true garbage collector.
Simple pseudocode to show what I mean by reference counting and graph management:
some_object * pObject = gc_alloc(sizeof(some_object));
some_container * pContainer = gc_alloc(sizeof(some_container));
pContainer->pObject = pObject;
/* let the GC know that pContainer has a reference to pObject */
gc_object_reference(pContainer, pObject);
/* a new block of some kind */
{
/* let the GC know we have a new reference for pObject */
some_object * pReference = gc_reference(pObject);
/* do stuff */
...
/* let the GC know that this reference is no longer used */
gc_left_scope(pReference);
}
gc_left_scope(pObject);
gc_run(); /* should not be able to recycle anything, since there still is a live
* reference to pContainer, and that has a reference to pObject
*/
精彩评论