How can a language be interpreted by itself (like Rubinius)?
I've been programming in Ruby for a while now with just the standard MRI implementation of Ruby, but I've always been curious about the other implementations I hear so much about.
I was reading about Rubinius the other day, a Ruby interpreter written in Ruby. I tried looking it up in various places, but I was having a hard time figuring out exactly how something like this works. I've never had much experience in compilers or language writing but I'm really interested to figure it out.
How exactly can a language be interpreted by itself? Is there a basic step in compiling that I don't understand where this makes sense? Can someone explain this to me like I'm an idiot (because that wouldn't be to开发者_运维百科o far off base anyways)
It's simpler than you think.
Rubinius is not 100% written in Ruby, just mostly.
From http://rubini.us/
A large aspect of popular languages such as C and Java is that the majority of the functionality available to the programmer is written in the language itself. Rubinius has the goal of adding Ruby to that list. Rubyists could more easily add features to the language, fix bugs, and learn how the language works. Wherever possible Rubinius is written in Ruby. Where not possible (yet), it's C++.
The concept you are looking for is compiler bootstrapping.
Basically bootstrapping means writing a compiler (or an interpreter) for language x in language x. This is done either by writing a basic compiler on a lower level by hand (i.e. writing a C compiler in Assembly), or by using a different high-level language.
Read more about bootstrapping on wikipedia. Greg's answer regarding meta-circular evaluators is also highly recommended, including the relevant chapter in SICP.
In case of Rubinius, the VM is written in C++ and deals with all the lowlevel (operating system related) stuff and base operations. The VM has it's own bytecode format (like the JVM has its own as well) and when Rubinius is started it starts the VM which executes the bytecode. Most of Rubinius' standard library (which is part of Ruby the language) is implemented in Ruby however, compared to C (MRI) or Java (JRuby). Also, the Rubinius bytecode compiler is also written in Ruby. So yeah, at some point early on in the beginning they had to use the standard Ruby interpreter (MRI) to bootstrap Rubinius. But this shouldn't be the case anymore (although I'm not sure if you still might need it since its build-system uses rake).
Suppose the language you are working with is some language, say Lisp, though it doesn't matter. (Could be C++, Java, Ruby, anything.)
Well you have an implementation of Lisp. Call this implementation Imp (just some made up name short for IMPlementation). Since Imp is a program in itself, your computer can run it. Now you write your own implementation for Lisp written in Lisp and you call it Circ. Circ is just a program compiled (or interpreted if you will) from Lisp code. Your code is written so it reads in a file, parses it (processes it into meaningful data), and does something with the data. What is this something? In the case of Circ, it executes the data.
But how does it do so?
Well suppose for a simple case that the code Circ reads in and parses is something simple like doing some math and outputting the result. Circ processes the code into easy to use data (well for a language like Lisp it's easy to begin with, but that's beyond the point) and stores it. Well in Lisp you can write code to crunch numbers, so the code written for Circ can do so too because it is written in Lisp. So the processed data is plugged into some addition processing code... and voila! You have the numerical result! Then your Circ program outputs the result.
The same thing can be done with more complex things than simple math. In fact you can compile/interpret other aspects of the language. Write enough of these 'other aspects' and glue them together, you get a a compiler for Lisp written in Lisp.
Since the compiler is compiled by Imp, it can be run by your machine, and presto! You are done.
This technique is generally called a metacircular evaluator and was first introduced several decades ago in the context of Lisp.
A good description of the technique can be found in Structure and Interpretation of Computer Programs, chapter 4.
精彩评论