How should I construct and walk the AST output of an ANTLR3 grammar?

2023-03-05 08:33 问答作者：

Documentation and general advice is that the abstract syntax tree should omit tokens that have no meaning. ("Record the meaningful input tokens (and only the meaningful tokens" - The Definitive ANTLR Reference) IE: In a C++ AST, you would omit the braces at the start and end of the class, as they have no meaning and are simply a mechanism for delineating class start and end for parsing purposes. I understand that, for the sake of rapidly and efficiently walking the tree, culling out useless token nodes like that is useful, but in order to appropriately colorize code, I would need that information, even if it doesn't contribute to the meaning of the code. A) Is there any reason why I shouldn't have the AST serve multiple purposes and choose not to omit said tokens?

It seems to me that what ANTLRWorks interpreter outputs is what I'm looking for. In the ANTLRWorks interpreter, it outputs a tree diagram where, for each rule matched, a node is made, as well as a child node for each token and/or subrule. The parse tree, I suppose it's called.

If manually walking a tree, wouldn't it be more useful to have nodes marking a rule? By having a node marking a rule, with it's subrules and tokens as children, a manual walker doesn't need to look ahead several nodes to know the context of what node it's on. Tree grammars seem redundant to me. Given a tree of AST nodes, the tree grammar "parses" the nodes all over again in order to produce some other output. B) Given that the parser grammar was responsible for generating correctly formed ASTs and given the inclusion of rule AST nodes, shouldn't a manual walker avoid the redundant AST node pattern matching of a tree grammar?

I fear I'm wildly misunderstanding the tree grammar mechanism's purpose. A tree grammar more or less defines a set of methods that will run through a tree, look for a pattern of nodes that matches the tree grammar rule, and execute some action based on that. I can't depend on forming my AST output based on what's neat and tidy for the tree grammar (omitting meaningless tokens for pattern matching speed) yet use the AST for color-coding even the meaningless tokens. I'm writing an IDE as well; I also can't write every possible AST node pattern that a plugin author might want to match, nor do I want to requir开发者_开发百科e them using ANTLR to write a tree grammar. In the case of plugin authors walking the tree for their own criteria, rule nodes would be rather useful to avoid needing pattern matching.

Thoughts? I know this "question" might push the limits of being an SO question, but I'm not sure how else to formulate my inquiries or where else to inquire.

Sion Sheevok wrote:

A) Is there any reason why I shouldn't have the AST serve multiple purposes and choose not to omit said tokens?

No, you mind as well keep them in there.

Sion Sheevok wrote:

It seems to me that what ANTLRWorks interpreter outputs is what I'm looking for. In the ANTLRWorks interpreter, it outputs a tree diagram where, for each rule matched, a node is made, as well as a child node for each token and/or subrule. The parse tree, I suppose it's called.

Correct.

Sion Sheevok wrote:

B) Given that the parser grammar was responsible for generating correctly formed ASTs and given the inclusion of rule AST nodes, shouldn't a manual walker avoid the redundant AST node pattern matching of a tree grammar?

Tree grammars are often used to mix custom code in to evaluate/interpret the input source. If you mix this code inside the parser grammar and there's some backtracking going on in the parser, this custom code could be executed more than it's supposed to. Walking the tree using a tree grammar is (if done properly) only possible in one way causing custom code to be executed just once.

But if a separate tree walker/iterator is necessary, there are two camps out there that advocate using tree grammars and others opt for walking the tree manually with a custom iterator. Both camps raise valid points about their preferred way of walking the AST. So there's no clear-cut way to do it in one specific way.

Sion Sheevok wrote:

Thoughts?

Since you're not evaluating/interpreting, you mind as well not use a tree grammar.

But to create a parse tree as ANTLRWorks does (which you have no access to, btw), you will need to mix AST rewrite rules inside your parser grammar though. Here's a Q&A that explains how to do that: How to output the AST built using ANTLR?

Good luck!

继续阅读：abstract-syntax-tree antlr compiler-construction parsing

How should I construct and walk the AST output of an ANTLR3 grammar?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？