From Source to AST to CodeDom
I am reading the book Language Implementation Patterns (http://pragprog.com/book/tpdsl/language-implementation-patterns) amongst a few others mixed in to clarify concepts as well as the occasional website. I am trying to make a tool that reads a trivial programming language and performs some basic analysis on it.
I am getting stuck in the design phase of this tool. I have constructed a simple handwritten recursive decent parser that validates a source file just fine. However, to perform source manipulations having a CodeDom tree would be useful.
The questions:
1) Are the logical steps a tool like this takes: Parse and build a textual tree and matching symbol table and then convert t开发者_开发技巧his to a CodeDom?
2) When building a textual tree, the most convenient would be a AST, easier to convert to a CodeDom .. but do Refactoring tools maintain a list of all embedded tokens in a statement in order to preserve inline comments and how do they track this in their tree?
You can build your own parser, your own tree builder, your own tree walker, your own analyzers, your own prettyprinters... but its a lot of work.
You might consider tools that provide all this machinery for you.
One such tool is our DMS Software Reengineering Toolkit.
Given a grammar, DMS will parse and automatically build a tree; yes, it automatically captures "microtokens" such as comments and attaches them to appropriate tree nodes. It can prettyprint the tree back out, before or after transformations. You have to provide support for symbol tables since that's a semantic, not a syntactic construct, but DMS provides generic symbol tables and scope management tools as a library to build upon. DMS also provides complete libraries for control and data flow analysis, which is needed if you want to do serious code transformations or refactorings.
One of DMS's nicest properties is the ability to apply transformations stated using the syntax of your grammar, e.g., "if you see this (in my language) then replace it by that".
You can see an example of defining lexer, parser, prettyprinter and transformation rules that define 9th grade algebra and a bit of calculus. The rewrite rules are used to carry out simplifications and computing symbolic derivatives on algebraic formulas.
精彩评论