Antlr Lexer Quoted String Predicate
I'm trying to build a lexer to tokenize lone words and quoted strings. I got the following:
STRING: QUOTE (options {greedy=false;} : . )* QUOTE ;
WS : SPACE+ { $channel = HIDDEN; } ;
WORD : ~(QUOTE|SPACE)+ ;
For the corner cases, it needs to parse:
"string" word1" word2
As three tokens: "string"
as STRING and word1"
and word2
as WORD. Basically, if there is a last quote, it needs to be part of the WORD were it is. If the quote is surrounded by white spaces, it should be a WORD.
I tried this rule for WORD, without success:
WORD: ~(QUOTE|SPACE)+
| (开发者_如何学Python~(QUOTE|SPACE)* QUOTE ~QUOTE*)=> ~(QUOTE|SPACE)* QUOTE ~(QUOTE|SPACE)* ;
I finally found something that could do the trick without resorting to writing Java code:
fragment QUOTE
: '"' ;
fragment SPACE
: (' '|'\r'|'\t'|'\u000C'|'\n') ;
WS : SPACE+ {$channel=HIDDEN;};
PHRASE : QUOTE (options {greedy=false;} : . )* QUOTE ;
WORD : (~(QUOTE|SPACE)* QUOTE ~QUOTE* EOF)=> ~(QUOTE|SPACE)* QUOTE ~(SPACE)*
| ~(QUOTE|SPACE)+ ;
That way, the predicate differentiate/solves for both:
PHRASE : QUOTE (options {greedy=false;} : . )* QUOTE ;
and
| ~(QUOTE|SPACE)+ ;
精彩评论