开发者

Transforming Lisp to C++

I am working on a toy language that compiles to C++ based on lisp (very small subset of scheme), I am trying to figure out how to represent let expression,

(let ((var 10)
      (test 12))
  (+ 1 1)
  var)

At first I thought execute all exprs then return the last one but retu开发者_StackOverflowrning will kill my ability to nest let expressions, what would be the way to go for representing let?

Also, any resources on source to source transformation is appriciated, I have googled but all I could fing is the 90 min scheme compiler.


One way to expand let is to treat it as a lambda:

((lambda (var test) (+ 1 1) var) 10 12)

Then, transform this to a function and a corresponding call in C++:

int lambda_1(int var, int test) {
    1 + 1;
    return var;
}

lambda_1(10, 12);

So in a larger context:

(display (let ((var 10)
               (test 12))
           (+ 1 1)
           var))

becomes

display(lambda_1(10, 12));

There are a lot more details, such as needing to access lexical variables outside the let from within the let. Since C++ doesn't have lexically nested functions (unlike Pascal, for example), this will require additional implementation.


I'll try to explain a naive approach to compiling nested lambdas. Since Greg's explanation of expanding let into a lambda is very good, I won't address let at all, I'll assume that let is a derived form or macro and is expanded into a lambda form that is called immediately.

Compiling Lisp or Scheme functions directly into C or C++ functions will be tricky due to the issues other posters raised. Depending on the approach, the resulting C or C++ won't be recognizeable (or even very readable).

I wrote a Lisp-to-C compiler after finishing Structure and Interpretation of Computer Programs (it's one of the final exercises, and actually I cheated and just wrote a translator from SICP byte code to C). The subset of C that it emitted didn't use C functions to handle Lisp functions at all. This is because the register machine language in chapter 5 of SICP is really lower level than C.

Assuming that you have some form of environments, which bind names to values, you can define the crux of function calling like this: extend the environment which the function was defined in with the formal parameters bound to the arguments, and then evaluate the body of the function in this new environment.

In SICP's compiler, the environment is held in a global variable, and there are other global variables holding the argument list for a function call, as well as the procedure object that is being called (the procedure object includes a pointer to the environment in which it was defined), and a label to jump to when a function returns.

Keep in mind that when you are compiling a lambda expression, there are two syntactic components you know at compile-time: the formal parameters and the body of the lambda.

When a function is compiled, the emitted code looks something like this pseudocode:

some-label
*env* = definition env of *proc*
*env* = extend [formals] *argl* *env*
result of compiling [body]
...
jump *continue*

... where *env* and *argl* are the global variables holding the environment and argument list, and extend is some function (this can be a proper C++ function) that extends the environment *env* by pairing up names in *argl* with values in [formals].

Then, when the compiled code is run, and there is a call to this lambda somewhere else in your code, the calling convention is to put the result of evaluating the argument list into the *argl* variable, put the return label in the *continue* variable, and then jump to some-label.

In the case of nested lambdas, the emitted code would look something like this:

some-label
*env* = definition env of *proc*
*env* = extend [formals-outer] *argl* *env*
another-label
*env* = definition env of *proc*
*env* = extend [formals-inner] *argl* *env*
result of compiling [body-inner]
...
jump *continue*
rest of result of compiling [body-outer]
... somewhere in here there might be a jump to another-label
jump *continue*

This is a bit difficult to explain, and I'm sure I've done a muddled job of it. I can't think of a decent example that doesn't involve me basically sloppily describing the whole chapter 5 of SICP. Since I spent the time to write this answer, I'm going to post it, but I'm very sorry if it's hopelessly confusing.

I strongly recommend SICP and Lisp in Small Pieces.

SICP covers metacircular interpretation for beginners, as well as a number of variants on the interpreter, and a byte code compiler which I have managed to obfuscate and mangle above. That's just the last two chapters, the first 3 chapters are just as good. It's a wonderful book. Absolutely read it if you haven't yet.

L.i.S.P includes a number of interpreters written in Scheme, a compiler to byte code and a compiler to C. I'm in the middle of it and can say with confidence it's a deep, rich book well worth the time of anyone interested in the implementation of Lisp. It may be a bit dated by this point, but for a beginner like me, it's still valuable. It's more advanced than SICP though, so beware. It incudes a chapter in the middle on denotational semantics which went basically right over my head.

Some other notes:

Darius Bacon's self-hosting Lisp to C compiler

lambda lifting, which is a more advanced technique that I think Marc Feeley uses


If you're looking for tools to help with source-to-source translation, I'd recommend ANTLR. It is most excellent.

However, you'll need to think about how to translate from a loosely-typed language (lisp) to a less-loosely-typed language (c). For example, in your question, what is the type of 10? A short? An int? A double?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜