How can a language be interpreted by itself (like Rubinius)?

2023-01-01 07:34 问答作者：

I've been programming in Ruby for a while now with just the standard MRI implementation of Ruby, but I've always been curious about the other implementations I hear so much about.

I was reading about Rubinius the other day, a Ruby interpreter written in Ruby. I tried looking it up in various places, but I was having a hard time figuring out exactly how something like this works. I've never had much experience in compilers or language writing but I'm really interested to figure it out.

How exactly can a language be interpreted by itself? Is there a basic step in compiling that I don't understand where this makes sense? Can someone explain this to me like I'm an idiot (because that wouldn't be to开发者_运维百科o far off base anyways)

It's simpler than you think.

Rubinius is not 100% written in Ruby, just mostly.

From http://rubini.us/

A large aspect of popular languages such as C and Java is that the majority of the functionality available to the programmer is written in the language itself. Rubinius has the goal of adding Ruby to that list. Rubyists could more easily add features to the language, fix bugs, and learn how the language works. Wherever possible Rubinius is written in Ruby. Where not possible (yet), it's C++.

The concept you are looking for is compiler bootstrapping.

Basically bootstrapping means writing a compiler (or an interpreter) for language x in language x. This is done either by writing a basic compiler on a lower level by hand (i.e. writing a C compiler in Assembly), or by using a different high-level language.

Read more about bootstrapping on wikipedia. Greg's answer regarding meta-circular evaluators is also highly recommended, including the relevant chapter in SICP.

In case of Rubinius, the VM is written in C++ and deals with all the lowlevel (operating system related) stuff and base operations. The VM has it's own bytecode format (like the JVM has its own as well) and when Rubinius is started it starts the VM which executes the bytecode. Most of Rubinius' standard library (which is part of Ruby the language) is implemented in Ruby however, compared to C (MRI) or Java (JRuby). Also, the Rubinius bytecode compiler is also written in Ruby. So yeah, at some point early on in the beginning they had to use the standard Ruby interpreter (MRI) to bootstrap Rubinius. But this shouldn't be the case anymore (although I'm not sure if you still might need it since its build-system uses rake).

Suppose the language you are working with is some language, say Lisp, though it doesn't matter. (Could be C++, Java, Ruby, anything.)

Well you have an implementation of Lisp. Call this implementation Imp (just some made up name short for IMPlementation). Since Imp is a program in itself, your computer can run it. Now you write your own implementation for Lisp written in Lisp and you call it Circ. Circ is just a program compiled (or interpreted if you will) from Lisp code. Your code is written so it reads in a file, parses it (processes it into meaningful data), and does something with the data. What is this something? In the case of Circ, it executes the data.

But how does it do so?

Well suppose for a simple case that the code Circ reads in and parses is something simple like doing some math and outputting the result. Circ processes the code into easy to use data (well for a language like Lisp it's easy to begin with, but that's beyond the point) and stores it. Well in Lisp you can write code to crunch numbers, so the code written for Circ can do so too because it is written in Lisp. So the processed data is plugged into some addition processing code... and voila! You have the numerical result! Then your Circ program outputs the result.

The same thing can be done with more complex things than simple math. In fact you can compile/interpret other aspects of the language. Write enough of these 'other aspects' and glue them together, you get a a compiler for Lisp written in Lisp.

Since the compiler is compiled by Imp, it can be run by your machine, and presto! You are done.

This technique is generally called a metacircular evaluator and was first introduced several decades ago in the context of Lisp.

A good description of the technique can be found in Structure and Interpretation of Computer Programs, chapter 4.

继续阅读：compiler-construction language-design rubinius ruby self-interpreter

How can a language be interpreted by itself (like Rubinius)?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？