identifier token keyword antlr parser
How to handle the case where the token 'for' is used in two different situations in the language to parse? Such as statement and as a "parameter" as the following example:
echo for print example
for i in {0..10..2}
do
echo "Welcome $i times"
done
Output:
for print e开发者_Go百科xample
Welcome 0 times
Welcome 2 times
Welcome 4 times
Welcome 6 times
Welcome 8 times
Welcome 10 times
Thanks.
The only way I see how you could go about doing this, is define an Echo
rule in your lexer grammar that matches the characters echo
followed by all other characters except \r
and \n
:
Echo
: 'echo' ~('\r' | '\n')+
;
and make sure that rule is before the rule that matches identifiers and keywords (like for
).
A quick demo of a possible start would be:
grammar Test;
parse
: (echo | for)*
;
echo
: Echo (NewLine | EOF)
;
for
: For Identifier In range NewLine
Do NewLine
echo
Done (NewLine | EOF)
;
range
: '{' Integer '..' Integer ('..' Integer)? '}'
;
Echo
: 'echo' ~('\r' | '\n')+
;
For : 'for';
In : 'in';
Do : 'do';
Done : 'done';
Identifier
: ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
;
Integer
: '0'..'9'+
;
NewLine
: '\r' '\n'
| '\n'
| '\r'
;
Space
: (' ' | '\t') {skip();}
;
If you'd parse the input:
echo for print example
for i in {0..10..2}
do
echo "Welcome $i times"
done
echo the end for now!
with it, it would look like:
alt text http://img571.imageshack.us/img571/5713/grammar.png
(I had to rotate the image a bit, otherwise it wouldn't be visible at all!)
HTH.
In order to do that you need to use a semantic predicate to only take that lexer rule when it really is the for
keyword.
Details are available on the keywords as identifiers page on the ANTLR wiki.
Well, it's pretty easy, most grammars use something like this:
TOKEN_REF
: 'A'..'Z' ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
;
So when referring to a print statement you would do something like:
'print' (TOKEN_REF)*
And with a for statement you just explicity state 'for' such as:
'for' INT 'in' SOMETHING
精彩评论