ANTLR - Implicit AND Tokens In Tree
I’m trying to build a grammar that interprets user-entered text, search-engine style. It will support the AND, OR, NOT and ANDNOT Boolean operators. I have pretty much everything working, but I want to add a rule that two adjacent keywords outside of a quoted string implicitly are treated as in an AND clause. For example:
cheese and crackers = cheese AND crackers
(up and down) or (left and right) = (up AND down) OR (left AND right)
cat dog “potbelly pig” = cat AND dog AND “potbelly pig”
I’m having trouble with the last one, and I’m hoping somebody can point me in the right direction. Here’s my *.g file thus far, and please be nice, my ANTLR experience spans less than a work day:
grammar SearchEngine;
options { language = CSharp2; output = AST; }
@lexer::namespace { Demo.SearchEngine }
@parser::namespace { Demo.SearchEngine }
LPARENTHESIS : '(';
RPARENTHESIS : ')';
AND : ('A'|'a')('N'|'开发者_JS百科n')('D'|'d');
OR : ('O'|'o')('R'|'r');
ANDNOT : ('A'|'a')('N'|'n')('D'|'d')('N'|'n')('O'|'o')('T'|'t');
NOT : ('N'|'n')('O'|'o')('T'|'t');
fragment CHARACTER : ('a'..'z'|'A'..'Z'|'0'..'9');
fragment QUOTE : ('"');
fragment SPACE : (' '|'\n'|'\r'|'\t'|'\u000C');
WS : (SPACE) { $channel=HIDDEN; };
PHRASE : (QUOTE)(CHARACTER)+((SPACE)+(CHARACTER)+)+(QUOTE);
WORD : (CHARACTER)+;
startExpression : andExpression;
andExpression : andnotExpression (AND^ andnotExpression)*;
andnotExpression : orExpression (ANDNOT^ orExpression)*;
orExpression : notExpression (OR^ notExpression)*;
notExpression : (NOT^)? atomicExpression;
atomicExpression : PHRASE | WORD | LPARENTHESIS! andExpression RPARENTHESIS!;
Since your AND-rule has the optional AND-keyword, you should create an imaginary AND-token and use a rewrite-rule to "inject" that token in your tree. In this case, you can't make use of ANTLR's short-hand ^
root-operator. You'll have to use the ->
rewrite operator.
Your andExpression
should look like:
andExpression
: (andnotExpression -> andnotExpression)
(AND? a=andnotExpression -> ^(AndNode $andExpression $a))*
;
A detailed description of this (perhaps cryptic) notation is given in Chapter 7, section Rewrite Rules in Subrules, page 173-174 of The Definitive ANTLR Reference by Terence Parr.
I ran a quick test to see if the grammar produces the proper AST with the new andExpression
rule. After parsing the string cat dog "potbelly and pig" and FOO
, the generated parser produced the following AST:
alt text http://img580.imageshack.us/img580/7370/andtree.png
Note that the AndNode
and Root
are imaginary tokens.
If you want to know how to create the AST picture above, see this thread: Visualizing an AST created with ANTLR (in a .Net environment)
EDIT
When parsing both one two three
and (one two) three
, the following AST is created:
alt text http://img203.imageshack.us/img203/2558/69551879.png
And when parsing (one two) OR three
, the following AST is created:
alt text http://img340.imageshack.us/img340/8779/73390353.png
which seems to be the proper way in all cases.
精彩评论