开发者

ANTLR - Implicit AND Tokens In Tree

I’m trying to build a grammar that interprets user-entered text, search-engine style. It will support the AND, OR, NOT and ANDNOT Boolean operators. I have pretty much everything working, but I want to add a rule that two adjacent keywords outside of a quoted string implicitly are treated as in an AND clause. For example:

cheese and crackers = cheese AND crackers

(up and down) or (left and right) = (up AND down) OR (left AND right)

cat dog “potbelly pig” = cat AND dog AND “potbelly pig”

I’m having trouble with the last one, and I’m hoping somebody can point me in the right direction. Here’s my *.g file thus far, and please be nice, my ANTLR experience spans less than a work day:

grammar SearchEngine;

options { language = CSharp2; output = AST; }

@lexer::namespace { Demo.SearchEngine }
@parser::namespace { Demo.SearchEngine }

LPARENTHESIS : '(';
RPARENTHESIS : ')';

AND    : ('A'|'a')('N'|'开发者_JS百科n')('D'|'d');
OR     : ('O'|'o')('R'|'r');
ANDNOT : ('A'|'a')('N'|'n')('D'|'d')('N'|'n')('O'|'o')('T'|'t');
NOT    : ('N'|'n')('O'|'o')('T'|'t');

fragment CHARACTER : ('a'..'z'|'A'..'Z'|'0'..'9');
fragment QUOTE     : ('"');
fragment SPACE     : (' '|'\n'|'\r'|'\t'|'\u000C');

WS     : (SPACE) { $channel=HIDDEN; };
PHRASE : (QUOTE)(CHARACTER)+((SPACE)+(CHARACTER)+)+(QUOTE);
WORD   : (CHARACTER)+;

startExpression  : andExpression;
andExpression    : andnotExpression (AND^ andnotExpression)*;
andnotExpression : orExpression (ANDNOT^ orExpression)*;
orExpression     : notExpression (OR^ notExpression)*;
notExpression    : (NOT^)? atomicExpression;
atomicExpression : PHRASE | WORD | LPARENTHESIS! andExpression RPARENTHESIS!;


Since your AND-rule has the optional AND-keyword, you should create an imaginary AND-token and use a rewrite-rule to "inject" that token in your tree. In this case, you can't make use of ANTLR's short-hand ^ root-operator. You'll have to use the -> rewrite operator.

Your andExpression should look like:

andExpression
  :  (andnotExpression        -> andnotExpression)
     (AND? a=andnotExpression -> ^(AndNode $andExpression $a))* 
  ;

A detailed description of this (perhaps cryptic) notation is given in Chapter 7, section Rewrite Rules in Subrules, page 173-174 of The Definitive ANTLR Reference by Terence Parr.

I ran a quick test to see if the grammar produces the proper AST with the new andExpression rule. After parsing the string cat dog "potbelly and pig" and FOO, the generated parser produced the following AST:

alt text http://img580.imageshack.us/img580/7370/andtree.png

Note that the AndNode and Root are imaginary tokens.

If you want to know how to create the AST picture above, see this thread: Visualizing an AST created with ANTLR (in a .Net environment)

EDIT

When parsing both one two three and (one two) three, the following AST is created:

alt text http://img203.imageshack.us/img203/2558/69551879.png

And when parsing (one two) OR three, the following AST is created:

alt text http://img340.imageshack.us/img340/8779/73390353.png

which seems to be the proper way in all cases.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜