开发者

Can ANTLR differentiate between lexer rules based on the following character?

For parsing a test file I'd like to allow identifier's to begin with a number.

my rule is:

ID  :   ('a'..'z' | 'A'..'Z' | '0'..'9' | '_') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '&' | '/' | '-' | '.')*
;

However I also need to match numbers in this file as well. My rule for that is:

INT :   '0'..'9'+
;

Obviously Antlr won't let me do this as INT will never be matched.

Is there a way to allow this? Specifically I'd like to match an INTEGER followed by an ID with no spac开发者_如何学Goes as just an ID and create an INT token only if it's followed by a space.

For example:

3BOB -> [ID with text "3BOB"]
3 BOB -> [INT with text "3"] [ID with text "BOB"]


Just change the order in which ID and INT tokens are defined.

grammar qqq;

// Parser's rules.

root:
    (integer|identifier)+
;

integer:
    INT {System.out.println("INT with text '"+$INT.text+"'.");}
;

identifier:
    ID {System.out.println("ID with text '"+$ID.text+"'.");}
;

// Lexer's tokens.

INT:    '0'..'9'+
;

ID:    ('a'..'z' | 'A'..'Z' | '0'..'9' | '_')
       ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '&' | '/' | '-' | '.')*
;

WS:    ' ' {skip();}
;

UNPREDICTED_TOKEN
:
    ~(' ') {System.out.println("Unpredicted token.");}
;

The order in which tokens are defined in grammar is significant: in case a string can be attributed to multiple tokens it is attributed to that one which is defined first. In your case if you want integer '123' to be attributed to INT when it still conforms to ID -- put INT definition first.

Antlr's token matching is greedy so it won't stop on '123' in '123BOB', but will continue until non of the tokens match the string and take the last token matched. So your identifiers now can start with numbers.

A remark on tokens order can also be found in this article by Mark Volkmann.


The following minor changes in your rules should do the trick:

ID  :   ('0'..'9')* // optional numbers
        ('a'..'z' | 'A'..'Z' | '_' |  '&' | '/' | '-' | '.') // followed by mandatory character which is not a number
        ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '&' | '/' | '-' | '.')* // followed by more stuff (including numbers)
;

INT :   '0'..'9'+ // a number
;

You simply let allow your identifiers to start with an optional number and make the following characters mandatory.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜