Building a control-flow graph from an AST with a visitor pattern using Java
I'm trying to figure out how to implement my LEParserCfgVisitor class as to build a control-flow graph from an Abstract-Syntax-Tree already generated with JavaCC. I know there are tools that already exist, but I'm trying to do it in preparation for my Compilers final.
I know I need to have a data structure that keeps the graph in memory, and I want to be able to keep attributes like IN, OUT, GEN, KILL in each node as to be able to do a control-flow analysis later on.
My main problem is that I haven't figured out how to connect the different blocks together, as to have the right edge between each blocks depending on their nature: branch, loops, etc. In other words, I haven't found an explicit algorithm that could help me build my visitor.
Here is my empty Visitor. You can see it works on basic langage expressions, like if, while 开发者_C百科and basic operations (+,-,x,^,...)
public class LEParserCfgVisitor implements LEParserVisitor
{
public Object visit(SimpleNode node, Object data) { return data; }
public Object visit(ASTProgram node, Object data) {
data = node.childrenAccept(this, data);
return data;
}
public Object visit(ASTBlock node, Object data) {
}
public Object visit(ASTStmt node, Object data) {
}
public Object visit(ASTAssignStmt node, Object data) {
}
public Object visit(ASTIOStmt node, Object data) {
}
public Object visit(ASTIfStmt node, Object data) {
}
public Object visit(ASTWhileStmt node, Object data) {
}
public Object visit(ASTExpr node, Object data) {
}
public Object visit(ASTAddExpr node, Object data) {
}
public Object visit(ASTFactExpr node, Object data) {
}
public Object visit(ASTMultExpr node, Object data) {
}
public Object visit(ASTPowerExpr node, Object data) {
}
public Object visit(ASTUnaryExpr node, Object data) {
}
public Object visit(ASTBasicExpr node, Object data) {
}
public Object visit(ASTFctExpr node, Object data) {
}
public Object visit(ASTRealValue node, Object data) {
}
public Object visit(ASTIntValue node, Object data) {
}
public Object visit(ASTIdentifier node, Object data) {
}
}
Can anyone give me a hand?
Thanks!
To do reasoning about data flows (gen/kill/use/def) you first need a control flow graph.
To construct one, at each tree node (e.g., inside each specific node visitor), build the piece of the graph that the node represents; pass the entry point arc and the exit arc for that graph to the parent "visitor". Purely independent vistors won't work because you need to pass information to parents. [You could add entry/exit arcs slots to each node that are set by the visitor, and inspected by the parent.]
Some nodes (e.g., for "assignmentstmt") will manufacture an action node referring to the AST for the assignment (think "rectangle" in a flowchart); there aren't any subgraph arcs to worry about. Some nodes (e.g., for "if") will manufacture a conditional branch node (referencing the AST node for the condition expression), (think "diamond" in a flowchart), an flow-join node, and compose a structured (if-then-else) subgraph by combining that conditional branch node with the subgraphs for the then and else clauses (represented just by then entry and exit arcs), with the flow join-node. You then pass the entry and exit arcs to this compound subgraph to the parent.
This scheme works with structured control flow. Unstructured control flow (e.g, "GOTO x") requires some funny adjustments, e.g, first building the structured part of the graph, associating generated control flow with labels, and then going back and patcing the GOTO actions to have an arc to the associated label.
Remember to worry about exceptions; they are GOTOs too, but generally to a point higher in the structured control flow graph. These are often handled by passing the innermost exception handler node down the tree; now your visitors need to look up the tree to see that most recent exception handler.
A more sophisticated scheme that uses generated vistors is called an http://en.wikipedia.org/wiki/Attribute_grammar">attribute grammar, which essentially manages all the information flows for you, by passing the values of interest (in this case, entry/exit/exception flow arcs) up and down the tree as parameters and results. You need an attribute grammar tool to do this; and you still have to specify node-building logic. We use attribute grammars and essentially the above control flow graph construction by pieces with our DMS Software Reengineering Toolkit to provide generic control flow analysis facilities for many languages.
Once you have the control flow graph, then you can implement a data flow solver of the type you describe by walking over the control flow graph. You'll need to re-visit the ASTs that the CF nodes point-to to collect the raw use/raw def information.
If your langauge has only structured control flow, then you can use the ASTs nodes to represent the control flow nodes, and compute the data flow directly.
More details on the general process can be found here.
精彩评论