ANTLR: simple example from ANTLRWorks wizard doesn't work
Grammar:
grammar test;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
STRING
: '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UNICODE_ESC
| OCTAL_ESC
;
fragment
OCTAL_ESC
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UNICODE_ESC
: '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
;
start
: STRING EOF;
It is grammar generated with wizard; I added rule 'start'.
Input in interpreter:
"abc"
Result in console:
[19:09:54] Interpreting...
[19:09:54] problem matching token at 1:2 MismatchedTokenException(97!=34)
[19:09:54] problem matching token at 1:3 NoViableAltException('b'@[1:1: Tokens : ( WS | STRING );])
[19:09:54] problem matching token at 1:4 NoViableAltException('c'@[1:1: Tokens : ( WS | STRING );])
[19:09:54] problem matching token at 1:5 NoViableAltException(''@[()* loopback of 11:12: ( ESC_SEQ | ~ ( '开发者_运维知识库\\' | '"' ) )*])
Screenshot: http://habreffect.ru/files/200/4cac2487f/antlr.png
ANTLRWorks v1.4 Tried also from console with ANTLR v3.2, same result.
If I type "\nabc" instead of "abc", it works fine. If I put ESC_SEQ on right in STRING rule, then "abc" works, but "\nabc" fails.
This appears to be a bug in ANTLRWorks 1.4. You could try with ATLRWorks 1.3 (or earlier), perhaps that version works properly (I did a quick check with v1.4 only!).
From the console, both your example strings ("abc"
and "\nabc"
) are being parsed without any problems. Here's my test-rig and the corresponding output:
grammar test;
start
: STRING {System.out.println("parsed :: "+$STRING.text);} EOF
;
WS
: (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;}
;
STRING
: '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
fragment
HEX_DIGIT
: ('0'..'9'|'a'..'f'|'A'..'F')
;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UNICODE_ESC
| OCTAL_ESC
;
fragment
OCTAL_ESC
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UNICODE_ESC
: '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
;
Note that the grammar is the same as yours, only formatted a bit different.
And the "main" class:
import org.antlr.runtime.*;
public class Demo {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream(args[0]);
testLexer lexer = new testLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
testParser parser = new testParser(tokens);
parser.start();
}
}
Now from the console you create a parser and lexer:
java -cp antlr-3.2.jar org.antlr.Tool test.g
Compile all .java source files:
javac -cp antlr-3.2.jar *.java
and run the "main" class:
java -cp .:antlr-3.2.jar Demo \"\\nabc\"
// output: parsed :: "\nabc"
java -cp .:antlr-3.2.jar Demo \"abc\"
// output: parsed :: "abc"
(for Windows, replace the :
with a ;
in the commands above)
Note that the command line parameters above are examples run on Bash, where the "
and \
need to be escaped: this may be different on your system. But as you can see from the output: both "\nabc"
and "abc"
get parsed properly.
ANTLRWorks is a great tool for editing grammar files, but (in my experience) has quite a bit of such funny bugs in it. That's why I only edit the grammar(s) with it and generate, compile and test the files on the console as I showed you.
HTH
精彩评论