Where can I learn the basics of writing a lexer?
I want to learn how to write a lexer. My university course had an assignment where we had to write a parser (and a lexer to go along with it) but this was given to us with no instruction or feedback (beyond the mark) so I didn't really learn much from it.
After searching for this topic, I can only find fairly advanced write ups which focus on areas which I feel are a few steps ahead of where I am at. I want a discussion on the basics of writing a lexer for a very simple language which I can use 开发者_StackOverflow社区as a basis for investigating tokenising more complex languages.
At this stage I'm not really interested in best practices or optimisation techniques but instead prefer a focus on the essentials. What are some good resources to get me started?
Basically there are two main approaches to writing a lexer:
- Creating a hand-written one in which case I recommend this small tutorial.
- Using some lexer generator tools such as lex. In this case, I recommend reading the tutorials to the particular tool of choice.
Also I would like to recommend the Kaleidoscope tutorial from the LLVM documentation. It runs through the implementation of a simple language and in particular demonstrates how to write a small lexer. There is a C++ and an Objective Caml version of the tutorial.
The classical textbook on the subject is Compilers: Principles, Techniques, and Tools also known as the Dragon Book. However this probably falls under the category of "fairly advanced write ups".
The Dragon Book is probably the definitive guide on the subject, although it can be a bit overwhelming. Language Implementation Patterns and Programming Language Pragmatics are great resources as well.
精彩评论