开发者

How to do Unicode escape decoding in Antlr tokenizer

I've created a antlr grammar using AntlrWorks, and have created a localization tool for internal use. I would like to convert unicode escape sequences into t开发者_运维知识库he actual Java character while parsing, but am unsure of the best way to do this. Here are the token definitions in my grammar. Is there some way to specify an action for the fragment UNICODE_ESC, that would return the character, instead of the six character escape sequence?

ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

INT :   '0'..'9'+
    ;

COMMENT
    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;

WS  :   ( ' '
        | '\t'
        | '\r'
        | '\n'
        ) {$channel=HIDDEN;}
    ;

STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;

fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;

fragment
ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;


Michael wrote:

This is in Java, so representation shouldn't be an issue for Character or String.

Yeah but in Java source file, the Unicode literals look just the same... So I'm not sure what you mean.

Michael wrote:

I am just wondering how to do the replacement. If it makes it easier, say I want to replace all UNICODE_ESC fragments with the character '?' while parsing.

Okay, that can be done like this:

Token : 'x' {setText("?");} ;

where Token matches the literal x and is then rewritten with ?.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜