How to force ANTLR to generate NoViableAltException?
I'm working with antlr 3.2. I have a simple grammar that consists of atoms (which are either the characters "0" or "1"), and a rule which accum开发者_JS百科ulates a comma separated list of them into a list.
When I pass in "00" as input, I don't get an error, which surprises me because this should not be valid input:
C:\Users\dan\workspace\antlrtest\test>java -cp antlr-3.2.jar org.antlr.Tool Test.g
C:\Users\dan\workspace\antlrtest\test>javac -cp antlr-3.2.jar *.java
C:\Users\dan\workspace\antlrtest\test>java -cp .;antlr-3.2.jar TestParser
[0]
How can I force a error to be generated in this case? It's particularly puzzling because when I use the interpreter in ANTLRWorks on this input, it does show a NoViableAltException.
I find that if I change the grammar to require, say, a semicolon at the end, an error is generated, but that solution isn't available to me in the real grammar I am working on.
Here is the grammar, which is self-contained and runnable:
grammar Test;
@parser::members {
public static void main(String[] args) throws Exception {
String text = "00";
ANTLRStringStream in = new ANTLRStringStream(text);
TestLexer lexer = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
System.out.println(new TestParser(tokens).mainRule());
}
}
mainRule returns [List<String> words]
@init{$words = new ArrayList<String>();}
: w=atom {$words.add($w.text);} (',' w=atom {$words.add($w.text);} )*
;
atom: '0' | '1';
WS
: ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; }
;
After your mainRule, you should add a EOF
token, otherwise ANTLR will stop parsing when there is no valid token to be matched.
Also, the atom
rule should really be a lexer rule instead of a parser rule (lexer rules start with a capital).
Try this instead:
grammar Test;
@parser::members {
public static void main(String[] args) throws Exception {
String text = "0,1 , 1 , 0,1";
ANTLRStringStream in = new ANTLRStringStream(text);
TestLexer lexer = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
System.out.println(new TestParser(tokens).mainRule());
}
}
mainRule returns [List<String> words]
@init{$words = new ArrayList<String>();}
: w=Atom {$words.add($w.text);} (',' w=Atom {$words.add($w.text);} )* EOF
;
Atom
: '0' | '1'
;
WS
: ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; }
;
EDIT
To clarify: as you already found out, EOF
is not mandatory. It will only cause the parser to go through the entire input. A NoViableAltException
is only thrown when the lexer stumbles upon a token/char that is not handled by your lexer grammar. Since you define three tokens in your grammar (0
, 1
and ,
) and your input, "00"
, does not contain any characters not handled by your grammar, no NoViableAltException
is thrown. If you change your input to something like "0?0"
, then a NoViableAltException
will pop up.
Since your parser finds the first 0
and then did not find a ,
, it simply stops parsing since you did not "tell" it to parse all the way to the end of the file.
Hope that clarifies things. If not, let me know.
精彩评论