Parsing ambiguous input with Antlr
I have been trying for a few days to parse some text that consists of text and numbers (I've called it a sentence in my grammar).
sentence options {
greedy=false;
}
: (ANY_WORD | INT)+;
I have a rule that needs to parse a sentence that finishes with an INT
sentence_with_int
: sentence INT;
So if I had some input that was " the number of size 14 shoes bought was 3 " then sentence_with_int would be matched not just sentence. I'm sure there is 开发者_StackOverflow社区a better way to do this but I'm just learning the tool.
Thanks, Richard
Your grammar:
grammar Test;
sentence_with_int
: sentence {System.out.println("Parsed: sentence='"+$sentence.text+"'");}
INT {System.out.println("Parsed: int='"+$INT.text+"'");}
;
sentence
: (ANY_WORD | INT)+
;
ANY_WORD
: ('a'..'z' | 'A'..'Z')+
;
INT
: ('0'..'9')+
;
WS
: (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;}
;
does exactly that. Here's a little test harness:
import org.antlr.runtime.*;
public class Demo {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("the number of size 14 shoes bought was 3");
TestLexer lexer = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
parser.sentence_with_int();
}
}
First generate a parser & lexer (assuming all your files, and the ANTLR jar, are in the same directory):
java -cp antlr-3.2.jar org.antlr.Tool Test.g
and compile all .java
source files:
javac -cp antlr-3.2.jar *.java
and finally run the Demo
class:
java -cp .:antlr-3.2.jar Demo
(on Windows, replace the :
with a ;
)
which produces the following output:
Parsed: sentence='the number of size 14 shoes bought was' Parsed: int='3'
精彩评论