Antlr: transform matched input on right side of a rule
I'm trying to write a language parser and build a nice AST. In the language, a function is essentially a variable with a callable value. For example:
int f(int arg) {...};
#int(int) f: int(int arg) {...};
both are equal, and I want to transform the first into the second. As you can see, the variable's type contains parameters, but without name. The function value needs the parameter name.
So the question is: Is it possible to get both (int arg)
and (int)
back from my rule that matches a parameter list, or is it alternatively possible to transform the first into the second on the right of the ->
?
I'll add an example source and result tree below
Input:
^(FUN_DEF
^(TYPE_SIMP 'int')
'f'
^(PARAM_LI开发者_JAVA技巧ST
^(PARAM 'int' 'arg')
)
^(BLOCK ...)
)
Result:
^(VAR_DEF
^(TYPE_FUN
^(TYPE_SIMP 'int')
^(PARAM_LIST
^(PARAM 'int')
)
)
'f'
^(FUN
^(TYPE_SIMP 'int')
^(PARAM_LIST
^(PARAM 'int' 'arg')
)
^(BLOCK ...)
)
)
A possibility would be invoke a custom method in your shortFunction
rule that, given a paramList
, would strip all identifiers from them leaving only the types and insert that tree in the proper place:
shortFunction
: type ID '(' paramList ')' block
-> ^( ... {customMethod($paramList.tree)} ... )
;
A demo:
file: Fun.g
grammar Fun;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
ROOT;
PARAM;
PARAM_LIST;
BLOCK;
VAR_DEF;
FUN;
TYPE_FUN;
TYPE_SIMP;
}
@parser::members {
private CommonTree stripIDs(CommonTree tree) {
CommonTree copy = new CommonTree(new CommonToken(PARAM_LIST, "PARAM_LIST"));
for(int i = 0; i < tree.getChildCount(); i++) {
CommonTree temp = (CommonTree)tree.getChild(i);
CommonTree child = new CommonTree(temp);
child.addChild(new CommonTree((CommonTree)temp.getChild(0)));
copy.addChild(child);
}
return copy;
}
}
parse
: function+ EOF -> ^(ROOT function+)
;
function
: shortFunction
| longFunction
;
shortFunction
: type ID '(' paramList ')' block
-> ^(VAR_DEF ^(TYPE_FUN ^(TYPE_SIMP type) {stripIDs($paramList.tree)}) ID ^(FUN ^(TYPE_SIMP type) paramList block))
;
longFunction
: '#' t1=type '(' typeList ')' ID ':' t2=type '(' paramList ')' block
-> ^(VAR_DEF ^(TYPE_FUN ^(TYPE_SIMP $t1) typeList) ID ^(FUN ^(TYPE_SIMP $t2) paramList block))
;
paramList
: (param (',' param)*)? -> ^(PARAM_LIST param*)
;
param
: type ID -> ^(PARAM type ID)
;
typeList
: (type (',' type)*)? -> ^(PARAM_LIST ^(PARAM type)*)
;
type
: INT
| SHORT
| BYTE
;
block
: '{' '...' '}' -> ^(BLOCK '...')
;
SHORT : 'short';
BYTE : 'byte';
INT : 'int';
ID : ('a'..'z' | 'A'..'Z') ('a'..'z' | 'A'..'Z' | '0'..'9')*;
SPACE : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
file: Main.java
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
String source = "#short(byte, int) f: short(byte a, int b) { ... } short f(byte a, int b) { ... }";
FunLexer lexer = new FunLexer(new ANTLRStringStream(source));
FunParser parser = new FunParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.parse().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
If you run the main class, you will see that the input:
#short(byte, int) f: short(byte a, int b) { ... }
short f(byte a, int b) { ... }
produces two identical trees:
精彩评论