开发者

Antlr Tree Pattern Matching with Rewrite Rules

I have a trivial antlr grammar for expressions like a.b.c + d.e.f:

grammar Test;
options {
    output=AST;
}
tokens {
    VARIABLE_ID;
    QUALIFIER_ID;
}

ID  : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
DOT : '.';
WS : ( ' ' | '\t' | '\r' | '\n' ) {$channel=HIDDEN;} ;

variable_id  : id=ID   -> VARIABLE_ID[$id];
qualifier_id  : id=ID   -> QUALIFIER_ID[$id];

expr_start : expr EOF;
expr : var (options {greedy=true;} : '+' expr)*;

var : variable_id (DOT qualifier_id)*;

Now I want to define a pattern matcher over this grammar that turns a.b.c into 0.1.2, so I define a Tree Pattern Matcher as follows

tree grammar TestWalker;
options {
    tokenVocab=Test;
    ASTLabelType=CommonTree;
    filter=true;
    backtrack=true;
}

@members {
    TokenRewriteStream tokens;

    public void setTreeNodeStream(TreeNodeStream input) {
        super.setTreeNodeStream(input);
        tokens = (TokenRewriteStream)input.getTokenStream(); 
    }
}

topdown : var;

variable_id [int i] : id=VARIABLE_ID {
    tokens.replace($id.getToken(), "" + $i);
};

qualifier_id [int i] : id=QUALIFIER_ID {
    tokens.replace($id.getToken(), "" + $i);
};

var 
@init { int index = 0; }
: variable_id[index] 
(   DOT 
    { ++index; }
    qualifier_id[index]
)*;

Then I put together a small test program:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;

public class Main {
    public static void main(String[] args) throws Exception {
        TestLexer lex = new TestLexer(new ANTLRInputStream(System.in));
        TokenStream tokens = new TokenRewriteStr开发者_运维知识库eam(lex);

        TestParser parser = new TestParser(tokens);
        TestParser.expr_return expr = parser.expr();

        CommonTreeNodeStream nodes = new CommonTreeNodeStream((Tree)expr.getTree());
        nodes.setTokenStream(tokens);
        TestWalker walker = new TestWalker(nodes);
        walker.downup(expr.getTree());
        System.out.println(tokens.toString());
    }
}

When I run this program with basic input, I see surprising results:

a.b.c -> 0.b.c

a.b + d.e -> 0.b + 0.e

and so on. It appears that the (DOT qualifier_id)* portion of my rule never matches and I cannot figure out why. I have tried adding my rules to the topdown and the bottomup portions of the Tree Pattern Match. If I switch from a filter matcher to a whole tree matcher and add rules to branch for the '+' case appropriately it works, but when the rewrite is just a smaller fragment of a much larger grammar this becomes untenable. Any pointers would be greatly appreciated.

Update: Using antlr 3.3


The crux of the issue (as identified by @Bart) is that the parse was not properly generating an AST. The grammar needs to build the AST (note the addtional ^ tokens).

grammar Test;
options {
    output=AST;
}
tokens {
    VARIABLE_ID;
    QUALIFIER_ID;
}

ID  : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
DOT : '.';
WS : ( ' ' | '\t' | '\r' | '\n' ) {$channel=HIDDEN;} ;

variable_id  : id=ID   -> VARIABLE_ID[$id];
qualifier_id  : id=ID   -> QUALIFIER_ID[$id];

expr_start : expr EOF;
expr : var (options {greedy=true;} : '+'^ expr)*;

var : variable_id (DOT^ qualifier_id)*;

Then the tree pattern matcher needs to walk the AST as it is constructed. Note the structure of the expr rule and the arguments used to handle state.

tree grammar TestWalker;
options {
    tokenVocab=Test;
    ASTLabelType=CommonTree;
    filter=true;
    backtrack=true;
}

@members {
    TokenRewriteStream tokens;

    public void setTreeNodeStream(TreeNodeStream input) {
        super.setTreeNodeStream(input);
        tokens = (TokenRewriteStream)input.getTokenStream(); 
    }
}

topdown : expr[0];

variable_id returns [int r] : id=VARIABLE_ID {
    $r = 0;
    tokens.replace($id.getToken(), "" + $r);
};

qualifier_id [int i] returns [int r] : id=QUALIFIER_ID {
    $r = $i + 1;
    tokens.replace($id.getToken(), "" + $r);
};

expr [int i] returns [int r]
    : v=variable_id { $r = $v.r; }
    | ^(DOT e=expr[$i] q=qualifier_id[$e.r] { $r = $q.r; } )
    ;

Now the output will run as expected:
a.b + d.e + c.d.e.f
0.1 + 0.1 + 0.1.2.3
and the produced AST looks correct.

Antlr Tree Pattern Matching with Rewrite Rules


I am not familiar with this (relatively new) pattern-tree walking. I glanced over it on the Wiki, but it's not in my copy of the ANTLR reference.

There is something not 100% with your Test grammar though: when I generate a parser from it, I get:

java -cp antlr-3.2.jar org.antlr.Tool Test.g
warning(200): Test.g:18:46: Decision can match input such as "'+'" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input

and when I visualize the AST your parser produces using your class (with a small addition):

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
    public static void main(String[] args) throws Exception {
        TestLexer lex = new TestLexer(new ANTLRStringStream("a.b + d.e"));
        TokenStream tokens = new TokenRewriteStream(lex);

        TestParser parser = new TestParser(tokens);
        TestParser.expr_return expr = parser.expr();

        CommonTreeNodeStream nodes = new CommonTreeNodeStream((Tree)expr.getTree());
        nodes.setTokenStream(tokens);

        CommonTree tree = (CommonTree)expr.getTree();

        DOTTreeGenerator gen = new DOTTreeGenerator();
        StringTemplate st = gen.toDOT(tree);
        System.out.println(st);

        TestWalker walker = new TestWalker(nodes);
        walker.downup(tree);
        System.out.println(tokens.toString());
    }
}

I see that the input a.b + d.e produces the AST:

Antlr Tree Pattern Matching with Rewrite Rules

I imagine your tree walker is iterating over said tree, which doesn't surprise me it doesn't produce the desired result since it's just a flat list of nodes with a single root.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜