ANTLR treats part of string as a keyword
I'm currently learning ANTLR for myself. First of I开发者_开发知识库 decided to write the simplest grammar. There is plain text file with directives:
pid = something.pid
log = something.log
The grammar I wrote is:
grammar TestGrammar;
options {
language = Java;
}
@header {
package test.antlr;
}
@lexer::header {
package test.antlr;
}
program
: directive+
;
directive
: pid
| log
;
pid
: PID EQ (WORD|POINT)+
;
log
: LOG EQ (WORD|POINT)+
;
WS: ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
PID
: 'pid'
;
LOG
: 'log'
;
EQ
: '='
;
POINT
: '.'
;
WORD
: ('a'..'z'|'A'..'Z'|'_')+
;
I feel I made a mistake somewhere and ANTLR proves that throwing MismatchedTokenException
. It treats something.pid as a directive
and throws an exception.
However I don't understand what am I doing wrong. Any help will be appreciated.
Thanks.
The lexer is a very simple object: without interference from the parser, it tokenizes the input source. So, the input:
pid = something.pid
is not tokenized as:
PID EQ WORD POINT WORD
but as:
PID EQ WORD POINT PID
That's why your rule:
pid
: PID EQ (WORD|POINT)+
;
matches "pid = something."
and leaves the second "pid"
in the token-stream, expecting an EQ
atfer it (hence the exception).
A possible fix would be to do something like this:
pid
: PID EQ (word|POINT)+
;
log
: LOG EQ (word|POINT)+
;
word
: WORD
| PID
| LOG
;
Or by doing something like:
pid
: PID EQ FULL_WORD
;
log
: LOG EQ FULL_WORD
;
// ...
FULL_WORD
: WORD (POINT WORD)*
;
// ...
精彩评论