How do I design and implement a programming language? [closed]
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this questionThis question is related to
- This question on Aardvark
- This question on here
The past couple of years I've been thinking about things I like and don't like about languages I use. I always wanted to write my own language, but never did so.
I also own both the Lego RCX and NXT, but most of the time I never actually make my robots do anything because开发者_如何学运维 of their restrictive visual programming environments.
I think I will design my programming language for the NXT because there are already tons of general purpose languages and the NXT gives me a concrete set of problems and goals and hopefully a nice sandbox to play with.
Now what? Where do I start? What do I need to know?
If possible, I'd write the compiler in Python or Clojure. There is an SDK for the NXT, but also an Assembly language. What would be the best/easiest route?
The Lego NXT has a small screen, USB and Bluetooth, it has 4 sensor ports both digital and analogue, 3 output ports and 2 ARM processors, one main processor and one co-processor. http://mindstormsnxt.blogspot.com/2006/08/whats-inside-nxt-brick.html
Programming the NXT is going to all about handling data and events, so some sort of monoiconic dataflow/reactive style seem appropriate. It should also handle parallel tasks well, so I'm thinking functional. I'm currently thinking of stack based as well.
In my head I'm already trying to unify these concepts and think of sample code. I'm thinking of a tree rather than a stack, where functional branches can run in parallel. An example:
# implicit main stack
5 5 +
# 10
# quoted branch or list
[1 -]
# 10 [1 -]
# eval list and recur until false
loop
# [9 8 7 6 5 4 3 2 1 0]
# define stack as a function
[1 = [1 8 motor] [1 0 motor] if] fn
# [9 8 7 6 5 4 3 2 1 0] <function>
# define function as a symbol
"handle-press" def
# [9 8 7 6 5 4 3 2 1 0]
# reactively loop over infinite lazy stack returned by sensor
# in a parallel branch
|4 sensor handle-press for|
# [9 8 7 6 5 4 3 2 1 0] [8 nil nil nil 8 ...]
There are obviously still gaping holes in the reasoning behind this, but I'm posting this rough sketch anyway to spark some helpful answers and discussion.
Now what? Where do I start? What do I need to know?
Start by learning more programming languages.
After learning several languages, buy a book on compilers. There are many. Google will help. It doesn't matter which one you buy. You'll need several. It's okay to read many books.
Once you've learned languages and read up on compilers, do the following.
Build the run-time libraries you need. Implement them in some suitable language like C or Python or whatever.
Once you have run-time libraries which really work. Really totally work. Totally. You can think about syntax and lexical scanning and compiling. Those are hard problems, but not half as hard as getting your run-time libraries to work.
Fooling around with syntax (i.e., a Domain Specific Language) is an attractive nuisance. Many people have "improved" syntax but no usable run-time libraries. So their "language" is incomplete because it doesn't do anything.
Get your language to do something first.
Don't afraid to write a compiler, which compiles to an existing language, and not to object code. For example, Lightweight C++ is a C++ -> C compiler is based on this idea (altough, C++ does the same job somewhere): http://linux.wareseeker.com/Programming/lightweight-c-1.3.2.zip/331414
If you have a small-but-smart idea on how to improve programming, it's a quick win way.
There's a similar situation with search engines. If I say, that I can do better than Google, maybe I can do it with a Google mashup, which reorganizes Google's result set, and I don't need to buy 343 Zigabytes of storage to set up a second Google just for changing the number of results from 10 to 15. (Unfortunatelly, it does not works if I have different ranking or crawling ideas.)
Maybe, Twitter is a better example. Write your own Twitter by using Twitter API. (Of course, only if your idea fits into the Twitter's base model.)
We're now working on a dataflow engine (see Wikipedia: flow-based programming, dataflow programming). We've developed a very lite new language, which has 3 instruction types (component creation, parameter setting, message declaration), and 2 block types (component declaration and implementation). It's compiled to C++ code, so the compiler is simple, and result is optimal fast. Also, there're several cases, when our language script is generated from configurations, or, more elegant, it supports metaprogramming.
We should break off the 1-step (source->executable) and 0-step (the source script is the executable) complilation languages; 3-4 level is easy yet to overview, and - if we do it right - it can make the developement more effective.
The easiest route is using a concatenative programming language, like Forth, Factor, or your own design of one.
The Forth interpreter is very easy to implement and does not need to take up more than a few KB; important for the Lego device. You need to understand how the Forth interpreter works. This is covered, for instance, in chapter 9 of Starting Forth.
Read fun books about language design!
the author of Clojure recommended following the book "lisp in small Pieces" by Christian Queinnec. The Clojure Reading list covers many books that incluenced the design of the Clojure language.
精彩评论