Is there a garbage collection algorithm that meets these requirements?
I am writing a compiler f开发者_如何学JAVAor a statically-typed object-oriented language. Currently I'm researching garbage collection algorithms to use. I'm wondering if there's a collector that is:
- Open source and documented, so that I can implement it.
- Acurrate
- Generational
- Global, i.e there is only one collector per process, as opposed to say one per thread.
- Incremental and/or concurrent, to avoid long pauses from major collections.
- Fits with this programming paradigm. An example of what doesn't would be a collector which becomes very slow in presence of destructive assignment.
Edit: To clarify, I was wondering if there's an implementable algorithm that does this, not if there's an off-the-shelf collector.
There's one not-at-all-experimental garbage collection algorithm that actually meets all your requirements: simple automatic refcounting. On the whole, refcounting really doesn't get enough credit as a viable option, but actually it works really nicely in many situations, there are never any big batch delays, and there's no need for complicated magic.
One concern is still cleaning up circular references, which you can at least leave to be done extremely rarely; app developers who care about speed can just explicitly break the loops when they need the objects to go away.
A little-appreciated feature of refcounting is that it's much more dcache-friendly than other forms of garbage collection. If you're running a loop that allocates some small temporary objects every time through the loop, a refcounting GC (or explicit memory management, of course) can reuse the same memory each time, avoiding unnecessary cache flushes. Any other kind of GC would only free up the objects periodically, resulting in a much bigger memory footprint and therefore slowness.
Refcounting is not very efficient for heavily multi-threaded systems, because you need to acquire locks every time you touch the refcount. But if you're designing a new language anyhow, there's one huge thing you can do to improve performance and reliability all over your language: prevent almost all objects from being shared between threads. ie. make sharing explicit. If you do that, you will know which objects are vs. aren't shared, and therefore which ones need to be locked when incrementing/decrementing the refcount and which can be left unlocked. When there isn't any locking, refcounting performance can be really excellent.
(I would rather let this as a comment but I have not enough rep.)
If you are looking for algorithms rather than code, I would definetely take a look on scholarly articles. I stumbled upon the Proceedings of OOPSLA 2003, and immediately I remembered of your question --- they had two sessions on Garbage Collection:
http://www.oopsla.org/oopsla2003/files/pap-session-garbage-collection-1.html
http://www.oopsla.org/oopsla2003/files/pap-session-garbage-collection-2.html
The advantage of those "big bang" moments for starting your research is that you can then use Google Scholar on any article that looks promising, and see if there are more up-to-date follow ups, by looking for a title and then clicking on the "cited by" link, for example:
http://scholar.google.com/scholar?cites=11437015034573374705&as_sdt=2005&sciodt=0,5&hl=en
(Since you have so many requirements, you might have to kiss many frogs before you find your on-the-fly collector.)
You could probably steal the garbage collector from mono, which is an open source implementation of .Net. They recently released a new GC engine that (I think) meets all the above requirements.
The problem with stealing a collector like this: garbage collectors often are tied in with the language they are written for. Good collectors for functional languages tend to act differently than collectors for imperative ones. Open source places there is probably reason to steal from:
- Mono
- Ocaml
- Python
- ...
This is (obviously) hard to answer without some better idea of the language you are hoping to host, but have you looked at the Parrot VM? PDD 9: Garbage Collection Subsystem discusses its GC and seems to hit the buzzwords you asked for, and the languages it was designed for (Perl6 primarily, with lua and a strongly typed javascript-ish thing called winxed as strong seconds) definitely have destructive assignment and Objects.
It is a VM target, though, not a standalone GC. I really doubt you'll find an off-the shelf GC (other than conservative collectors like Boehm) that isn't associated with some sort of VM, since getting it to be accurate requires more information about the stack frame than disassembly can give.
The Azul Garbage Collector is proprietary, but there's enough information available about their algorithm that you should be able to implement something like it: http://news.ycombinator.com/item?id=2022723
It definitely supports "pauseless" collection, though the complexity of doing this is a good sign of why people usually don't.
精彩评论