Tree structure for AST
I am trying to write an interpreter in C# and I am on a parsing stage. I figured out I have to generate Abstract Syntax Tree at this point, but I don't know how to represent it in C#.
Cur开发者_如何转开发rently I am just using List<object>
, but I have a feeling that I am doing it wrong.
Thanks a lot.
There are numerous techniques, ranging from a single "node" type — a big bucket of fields with a "type" tag — to a fine-grained hierarchy of specific classes. The important thing is to think about how to make traversal code simple and robust, because you will need to traverse this data structure over and over again.
Taking a step back, however, interpretation doesn't strictly require an AST. Many earlier interpreters would quite literally read each line of code as they encountered it, parsing and executing it on the fly, with loops, etc., implemented via a stack-based book-marking system. I suspect shell languages like bash and cmd.exe still work like this today.
I suggest that you continue using List<object>
until you understand clearly what limitations does this impose and what your requirements are. Or, since you are writing a LISP interpreter, create a pair class and use that with object
, null
being the equivalent of '()
:
public sealed class Pair
{
public object Car ;
public object Cdr ;
}
Parse your input directly into an S-expression and have your interpreter work directly with that. After all, LISP existed before ASTs!
Most AST node implementations are pretty simple.
They are a struct (ok, ok, "class") containing a node type (usually an integer), a list of children (List is OK; high performance implementations have a set of members for statistically common 1st, 2nd, 3rd children), and some additional fields to carry values specific to the AST node instance, (e.g., the value "5" for the AST node "integer constant"). To enable efficient navigation of the tree from any node back to parents, there is often a special reference back to the parent node.
What's harder is deciding what set of AST nodes you should have. For a large grammar, this is inconvenient as you'll have to define a set of several hundred of them, and there will be churn as you modify the grammar in your attempts to get it right.
A simple trick is simply define one AST node per grammar rule. (This would be called a "concrete syntax tree" by most). But it is brainless and you don't miss anything.
Our DMS Software Reengineering Toolkit follows this "simple trick" idea, generating the AST node types dirrectly from the grammar rules. It additionally optimizes: leaf AST nodes that don't carry values aren't present in the tree, nodes for unary productions aren't present in the tree, and list-forming productions have the List style of children, while other node types have the fixed slot types for children. The net result is you what is pretty close to a "abstract" syntax tree anyway. All of this is automatically constructed by DMS's parser generator so you don't have think at all.
DMS also has a full, well-test C# 4.0 front end. Once you get past the headache of defining the AST, you'll then want to analyze/transform/generate from it, and the rest of DMS will suddenly become obviously valuable.
精彩评论