ANTLR grammar not handling my "not" operator correctly
I am trying to parse a small expression language (I didn't define the language, from a vendor) and everything is fine until I try to use the not operator, which is a tilde in this language.
My grammar has been heavily influenced by these two links (aka shameless cut and pasting):
http://www.codeproject.com/KB/recipes/sota_expression_evaluator.aspx http://www.alittlemadness.com/2006/06/05/antlr-by-example-part-1-the-language
The language consists of three expression types that can be used with and, or, not operators and parenthesis change precedence. Expressions are:
Skill("name") > some_number (can also be <, >=, <=, =, !=)
SkillExists("name")
LoggedIn("name") (this one can also have name@name)
This input works fine:
Skill("somename") > 1 | (LoggedIn("somename") & SkillExis开发者_如何学运维ts("othername"))
However, as soon as I try to use the not operator I get NoViableAltException. I can't figure out why. I have compared my grammar to the ECalc.g one at the codeproject.com link and they seem to match, there must be some subtle difference I can't see. Fails:
Skill("somename") < 10 ~ SkillExists("othername")
My Grammar:
grammar UserAttribute;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
SKILL = 'Skill' ;
SKILL_EXISTS = 'SkillExists' ;
LOGGED_IN = 'LoggedIn';
GT = '>';
LT = '<';
LTE = '<=';
GTE = '>=';
EQUALS = '=';
NOT_EQUALS = '!=';
AND = '&';
OR = '|' ;
NOT = '~';
LPAREN = '(';
RPAREN = ')';
QUOTE = '"';
AT = '@';
}
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
expression : orexpression EOF!;
orexpression : andexpression (OR^ andexpression)*;
andexpression : notexpression (AND^ notexpression)*;
notexpression : primaryexpression | NOT^ primaryexpression;
primaryexpression : term | LPAREN! orexpression RPAREN!;
term : skill_exists | skill | logged_in;
skill_exists : SKILL_EXISTS LPAREN QUOTE NAME QUOTE RPAREN;
logged_in : LOGGED_IN LPAREN QUOTE NAME (AT NAME)? QUOTE RPAREN;
skill: SKILL LPAREN QUOTE NAME QUOTE RPAREN ((GT | LT| LTE | GTE | EQUALS | NOT_EQUALS)? NUMBER*)?;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
NAME : ('a'..'z' | 'A'..'Z' | '_')+;
NUMBER : ('0'..'9')+ ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
I have 2 remarks:
1
Since you're parsing single expressions (expression : orexpression EOF!;
), the input "Skill("somename") < 10 ~ SkillExists("othername")"
is not only invalid in your grammar, but it's invalid in terms of any expression parser (I know of). A notexpression
only takes a "right-hand-side" expression, so ~ SkillExists("othername")
is a single expression and Skill("somename") < 10
is also a single expression. But in between those two single expression, there's no OR
or AND
operator. It would be the same as evaluating the expression true false
instead of true | false
or true and false
.
In short, your grammar disallows:
Skill("somename") < 10 ~ SkillExists("othername")
but allows for:
Skill("somename") < 10 & SkillExists("othername")
which seems logical to me.
2
I don't quite understand your skill
rule (which is ambiguous, btw):
skill
: SKILL LPAREN QUOTE NAME QUOTE RPAREN
((GT | LT| LTE | GTE | EQUALS | NOT_EQUALS)? NUMBER*)?
;
This means that the operator is optional and there can be zero or more numbers at the end. This means that the following input are all valid:
Skill("foo") = 10 20
Skill("foo") 10 20 30
Skill("foo") <
Perhaps you meant:
skill
: SKILL LPAREN QUOTE NAME QUOTE RPAREN
((GT | LT| LTE | GTE | EQUALS | NOT_EQUALS)^ NUMBER)?
;
instead? (the ?
becomes a ^
and the *
is removed)
If I only change that rule and parse the input:
Skill("somename") < 10 & SkillExists("othername")
the following AST is created:
(as you can see, the AST needs to be better formed: i.e. you need some rewrite rules in your skill_exists
, logged_in
and skill
rules)
EDIT
and if you want successive expressions to have implied AND
tokens in between, do something like this:
grammar UserAttribute;
...
tokens {
...
I_AND; // <- added a token without any text (imaginary token)
AND = '&';
...
}
andexpression
: (notexpression -> notexpression) (AND? notexpression -> ^(I_AND $andexpression notexpression))*
;
...
As you can see, since the AND
is now optional, it cannot be used inside a rewrite rule, but you'll have to use the imaginary token I_AND
.
If you now parse the input:
Skill("somename") < 10 ~ SkillExists("othername")
you will get the following AST:
精彩评论