开发者

Antlr (lexer): matching the right token

In my Antlr3 grammar, I have several "overlapping" lexer rules, like this:

NAT: ('0' .. '9')+ ;
INT: ('+' | '-')? ('0' .. '9')+ ;
BITVECTOR: ('0' | '1')* ;

Although tokens like 100110 and 123 can be matched by more than one of those rules, it is always determined by context which of them it has to be. Example:

s: a | b | c ;
a: '<' NAT '>' ;
b: '{' INT '}' ;
c: '[' BITVECTOR ']' ;

The input {17} should then match {, INT, and }, but the lexer has already decided that 17 is a NA开发者_运维百科T-token. How can I prevent this behavior? The backtrack option is already set to true, but it only seems to affect parser rules.


There might be a complex way to make the lexer context-sensitive, but in general that's what you want the parser to take care of, and you want your lexer to just provide a stream of tokens. My recommendation is to refactor your lexer to return DIGITS and SIGN and let your parser work out what kind of number the digits represent by the context.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜