Implementing "Generator" support in a custom language
I've got a bit of fettish for language design and I'm currently playing around with my own hobby language. (http://rogeralsing.com/2010/04/14/playing-with-plastic/)
One thing that really makes my mind bleed is "generators" and the "yield" keyword. I know C# uses AST transformation to transform enumerator methods into statemachines.
But how does it work in other languages? Is there any way to get generator support in a language w/o AST transformation? e.g. Does languages like Python or Ruby resort to AST transformations to solve this to?
开发者_JAVA百科(The question is how generators are implemented under the hood in different languages, not how to write a generator in one of them)
Generators are basically semi-coroutines with some annoying limitations. So, obviously, you can implement them using semi-coroutines (and full coroutines, of course).
If you don't have coroutines, you can use any of the other universal control flow constructs. There are a lot of control flow constructs that are "universal" in the sense that every control flow construct (including all the other universal control flow constructs), including coroutines and thus generators can be (more or less) trivially transformed into only that universal construct.
The most well-known of those is probably GOTO
. With just GOTO
, you can build any other control flow construct: IF-THEN-ELSE
, WHILE
, FOR
, REPEAT-UNTIL
, FOREACH
, exceptions, threads, subroutine calls, method calls, function calls and so on, and of course also coroutines and generators.
Almost all CPUs support GOTO
(although in a CPU, they usually call it jmp
). In fact, in many CPUs, GOTO
is the only control flow construct, although today native support for at least subroutine calls (call
) and maybe some primitive form of exception handling and/or concurrency primitive (compare-and-swap) are usually also built in.
Another well-known control flow primitive are continuations. Continuations are basically a more structured, better manageable and less evil variant of GOTO
, especially popular in functional languages. But there also some low-level languages that base their control flow on continuations, for example the Parrot Virtual Machine uses continuations for control flow and I believe there are even some continuation-based CPUs in some research lab somewhere.
C has a sort-of "crappy" form of continuations (setjmp
and longjmp
), that are much less powerful and less easy to use than "real" continuations, but they are plenty powerful enough to implement generators (and in fact, can be used to implement full continuations).
On a Unix platform, setcontext
can be used as a more powerful and higher level alternative to setjmp
/longjmp
.
Another control flow construct that is well-known, but doesn't probably spring to mind as a low-level substrate build other control flow constructs on top of, are exceptions. There is a paper that shows that exceptions can be more powerful than continuations, thus making exceptions essentially equivalent to GOTO
and thus universally powerful. And, in fact, exceptions are sometimes used as universal control flow constructs: the Microsoft Volta project, which compiled .NET bytecode to JavaScript, used JavaScript exceptions to implement .NET threads and generators.
Not universal, but probably powerful enough to implement generators is just plain tail call optimization. (I might be wrong, though. I unfortunately don't have a proof.) I think you can transform a generator into a set of mutually tail-recursive functions. I know that state machines can be implemented using tail calls, so I'm pretty sure generators can, too, since, after all, C# implements generators as state machines. (I think this works especially well together with lazy evaluation.)
Last but not least, in a language with a reified call stack (like most Smalltalks for example), you can build pretty much any kind of control flow constructs you want. (In fact, a reified call stack is basically the procedural low-level equivalent to the functional high-level continuation.)
So, what do other implementations of generators look like?
Lua doesn't have generators per se, but it has full asymmetric coroutines. The main C implementation uses setjmp
/longjmp
to implement them.
Ruby also doesn't have generators per se, but it has Enumerator
s, that can be used as generators. Enumerator
s are not part of the language, they are a library feature. MRI implements Enumerator
s using continuations, which in turn are implemented using setjmp
/longjmp
. YARV implements Enumerator
s using Fiber
s (which is how Ruby spells "coroutines"), and those are implemented using setjmp
/longjmp
. I believe JRuby currently implements Enumerator
s using threads, but they want to switch to something better as soon as the JVM gains some better control flow constructs.
Python has generators that are actually more or less full-blown coroutines. CPython implements them using setjmp
/longjmp
.
精彩评论