开发者

Compacting garbage collector implementation in C++0x

I'm implementing a compacting garbage collector for my own personal use in C++0x, and I've got a question. Obviously the mechanics of the collector depend upon moving objects, and I've been wondering how to implement this in terms of the smart pointer types that point to it. I've been thinking about either pointer-to-pointer in the pointer type itself, or, the collector maintains a list of pointers that point to each object so that they can be modified, removing the need for a double de-ref when accessing the pointer but adding some extra overhead during collecti开发者_Go百科on and additional memory overhead. What's the best way to go here?

Edit: My primary concern is for speedy allocation and access. I'm not concerned with particularly efficient collections or other maintenance, because that's not really what the GC is intended for.


There's nothing straight forward about grafting on extra GC to C++, let alone a compacting algorithm. It isn't clear exactly what you're trying to do and how it will interact with the rest of the C++ code.

I have actually written a gc in C++ which works with existing C++ code, and it had a compactor at one stage (though I dropped it because it was too slow). But there are many nasty semantic problems. I mentioned to Bjarne only a few weeks ago that C++ lacks the operator required to do it properly and the situation is that it is unlikely to ever exist because it has limited utility..

What you actually need is a "re-addres-me" operator. What happens is that you do not actually move objects around. You just use mmap to change the object address. This is much faster, and, in effect, it is using the VM features to provide handles.

Without this facility you have to have a way to perform an overlapping move of an object, which you cannot do in C++ efficiently: you'd have to move to a temporary first. In C, it is much easier, you can use memmove. At some stage all the pointers to or into the moved objects have to be adjusted.

Using handles does not solve this problem, it just reduces the problem from arbitrary sized objects to constant sized ones: these are easier to manage in an array, but the same problem exists: you have to manage the storage. If you remove lots of handle from the array randomly .. you still have a problem with fragmentation.

So don't bother with handles, they don't work.

This is what I did in Felix: you call new(shape, collector) T(args). Here the shape is a descriptor of the type, including a list of offsets which contain (GC) pointers, and the address of a routine to finalise the object (by default, it calls the destructor).

It also contains a flag saying if the object can be moved with memmove. If the object is big or immobile, it is allocated by malloc. If the object is small and mobile, it is allocated in an arena, provided there is space in the arena.

The arena is compacted by moving all the objects in it, and using the shape information to globally adjust all the pointers to or into these objects. Compaction can be done incrementally.

The downside for a C++ programmer is the need to construct a correct shape object to pass. This doesn't bother me because I'm implementing a language which can generate the shape information automatically.

Now: the key point is: to do compaction, you must use a precise collector. Compaction cannot work with a conservative collector. This is very important. It is fine to allow some leakage if you see an value that looks like a pointer but happens to be an integer: some object won't be collected, but this is usually no big deal. But for compaction you have to adjust the pointers but you'd better not change that integer: so you have to know for sure when something is a pointer, so your collector has to be precise: the shape must be known.

In Ocaml this is relatively simple: everything is either a pointer or integer and the low bit is used at run time to tell. Objects pointed at have a code telling the type, and there are only a few types: either a scalar (don't scan it) or an aggregate (scan it, it only contains integers or pointers).


This is a pretty straight-forward question so here's a straight-forward answer:

Mark-and-sweep (and occasionally mark-and-compact to avoid heap fragmentation) is the fastest when it comes to allocation and access (avoiding double de-refs). It's also very easy to implement. Since you're not worried about collection performance impact (mark-and-sweep tends to freeze up the process in a nondeterministically), this should be the way to go.

Implementation details found at:

  • http://www.brpreiss.com/books/opus5/html/page424.html#secgarbagemarksweep
  • http://www.brpreiss.com/books/opus5/html/page428.html


A nursery generation will give you the best possible allocation performance because it is just a pointer bump.

You could implement pointer updates without using double indirection by using techniques like a shadow stack but this will be slow and very error prone if you're writing this C++ code by hand.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜