How to replace macros with a grammar-based parser?

2023-04-04 17:09 问答作者：

I need a parser for an exotic programming language. I wrote a grammar for it and used a parser generator (PEGjs) to generate the parser. That works perfectly... except for one thing: macros (that replace a placeholder with predefined text). I don't know how to integrate this into a grammar. Let me illustrate the problem:

An example program to be parsed typically looks like th开发者_开发问答is:

instructionA parameter1, parameter2
instructionB parameter1
instructionC parameter1, parameter2, parameter3

No problem so far. But the language also supports macros:

Define MacroX { foo, bar }
instructionD parameter1, MacroX, parameter4

Define MacroY(macroParameter1, macroParameter2) {
  instructionE parameter1, macroParameter1
  instructionF macroParameter2, MacroX
}

instructionG parameter1, MacroX
MacroY

Of course I could define a grammar to identify Macros and references to Macros. But in that case I don't know how I would parse the contents of a Macro, because it's not clear what the macro contains. It could be just one parameter (that's easiest), but it could also be several parameters in one macro (like MacroX in my example, which represents two parameters) or a whole block of instructions (like MacroY). And Macros can even contain other Macros. How do I put this into a grammar if it's not clear what the macro is semantically?

The easiest approach seems to be to run a preprocessor first to replace all the macros and only then run the parser. But in that case the line numbers get messed up. I want the parser to generate error messages containing the line number if there is a parse error. And if I preprocess the input, the line numbers do not correspond anymore.

Help very much appreciated.

Macro processors tend not to respect the boundaries of language elements; in essence, they (often) can make arbitrary changes to the apparant input string.

If this is the case, you have little choice: you'll need to build a macro processor, that can preserve the line numbers.

If the macros always contain well-structured language elements, and they always occur in structured places in the code, then you can add the notion of a macro definition and call to your grammar. This may make your parses ambiguous; foo(x) in C code might be macro call, or it might be a function call. You'll have to resolve that ambiguity somehow. C parsers used to solve such ambiguity problems by collecting symbol table information as they parsed; if you collect is-foo-a-macro as you parse, then you can determine that foo(x) is a macro call or not.

With PEG you have to manually define the places where you can check for macro extensions. You can add your macro to a hash and check for it in the PEG rule(s), which do allow macros (infix expr, postfix expr, unop, binop, function call, ...). It's not so easy as in lisp, but much easier than with YACC and its operator precedence hacks :)

Other known PEG frameworks which allow macros, like parrot, perl6, katahdin or PFront use the trick to execute the parse at run-time, thus trading against performance. Or you can do both and allow pre-compiled and interpreted PEG parsing. There are several projects which thought about that, but you need a fast VM, like luajit, java, clr or friends.

I use special syntax block keywords to load external shared libraries with the external pre-compiled PEG parser. E.g. to parse SQL or FFI declarations into your AST. But you can also require a C compiler and compile the parse at run-time for all macros.

With PEG it is significantly easier than with anything else. First of all, Packrat-based parsers and alike are extensible. Your macro definition can modify the syntax, so the next time it is used it will be parsed naturally. See here and here some extreme examples of this approach.

Another possibility is to chain parsers, which is also trivial with PEG-based approaches.

继续阅读：compiler-construction grammar interpreter parsing yacc

How to replace macros with a grammar-based parser?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？