Writing part of a compiler (written in c++) in Perl
i am trying to learn more about compilers and programming languages, unfortunately my university doesnt offer a course about compilers and so i have to do myself (thank you internet).
At the moment im tryin to understand and to implement a lexer for my language and i need regular expressions.
I am used to script perl regex pretty quickly and i thought that i could embed Perl in my C++ lexer. Now the questions are:
- Will it cause Heavy overhead? 开发者_运维问答
- Should i try to make peace with BOOST (or any other c++ library good gor regex) ?
Thank you for reading this :)
Embedding Perl in your project just to do regular expressions would be like trying to stuff an elephant into a Miata to get more trunk space. (Badump!)
Boost would be one way to handle regular expressions, or if you're writing in an environment that supports POSIX.2, look into the regcomp()
, regexec()
and regfree()
functions.
After you've written your own lexer, investigate a tool called lex
which is pretty much the gold standard for developing lexical analyzers. It has a partner called YACC
for developing parsers. Both are time tested and generate tight, bug-free code. (GNU-ish environments call these programs flex
and bison
.)
No reason you can't, part of being a good programmer is using the right tool for the job, and perl is VERY good at text processing.
However, instead of thinking about stuffing a perl-based lexer into your C++ compiler (written in C++, not compiling C++, I hope), you should think about writing a perl module in C++, and letting the compiler driver be written in perl, do the lexing, fill in data structures, and then call the C++ module's functions to finish the compile.
If all you really want is Perl-style regular expressions, look into the libpcre library. It's very well tested, very portable, and in my experience easy to work with. Recommended software. (And probably already on your machine. :)
See the bottom of the "What good is \G in a regular expression?" section of perlfaq6. It describes how //gc can be used to create a tokeniser aka lexer.
精彩评论