Using ANTLR to parse a log file
I'm just about starting with ANTLR and trying to parse some pattern out of a log file
for example: log file:
7114422 2009-07-16 15:43:07,078 [LOGTHREAD] INFO StatusLog - Task 0 input : uk.project.Evaluation.Input.Function1(selected=["red","yellow"]){}
7114437 2009-07-16 15:43:07,093 [LOGTHREAD] INFO StatusLog - Task 0 output : uk.org.project.Evaluation.Output.Function2(selected=["Rocket"]){}
7114422 2009-07-16 15:43开发者_StackOverflow社区:07,078 [LOGTHREAD] INFO StatusLog - Task 0 input : uk.project.Evaluation.Input.Function3(selected=["blue","yellow"]){}
7114437 2009-07-16 15:43:07,093 [LOGTHREAD] INFO StatusLog - Task 0 output : uk.org.project.Evaluation.Output.Function4(selected=["Speech"]){}
Now I have to parse this file to only find 'Evaluation.Input.Function1' and it's values 'red' and 'yellow' and 'Evaluation.Output.Function2' and values 'Rocket' and ignore everything else and similarly the other 2 input and output functions 3,4 below. There are many such Input and Output functions and I have to find such sets of input/output functions. This is my attempted grammar which is not working. Any help would be appreciated. Being my first attempt at writing grammar and ANTLR it is becoming quite daunting now..
grammar test;
tag : inputtag+ outputtag+ ;
//Input tag consists of atleast one inputfunction with one or more values
inputtag: INPUTFUNCTIONS INPUTVALUES+;
//output tag consists of atleast one ontput function with one or more output values
outputtag : OUTPUTFUNCTIONS OUTPUTVALUES+;
INPUTFUNCTIONS
: INFUNCTION1 | INFUNCTION2;
OUTPUTFUNCTIONS
:OUTFUNCTION1 | OUTFUNCTION2;
// Possible input functions in the log file
fragment INFUNCTION1
:'Evaluation.Input.Function1';
fragment INFUNCTION2
:'Evaluation.Input.Function3';
//Possible values in the input functions
INPUTVALUES
: 'red' | 'yellow' | 'blue';
// Possible output functions in the log file
fragment OUTFUNCTION1
:'Evaluation.Output.Function2';
fragment OUTFUNCTION2
:'Evaluation.Output.Function4';
//Possible ouput values in the output functions
fragment OUTPUTVALUES
: 'Rocket' | 'Speech';
When you're only interested in a part of the file you're parsing, you don't need a parser and write a grammar for the entire format of the file. Only a lexer-grammar and ANTLR's options{filter=true;}
will suffice. That way, you will only grab the tokens you defined in your grammar and ignore the rest of the file.
Here's a quick demo:
lexer grammar TestLexer;
options{filter=true;}
@lexer::members {
public static void main(String[] args) throws Exception {
String text =
"7114422 2009-07-16 15:43:07,078 [LOGTHREAD] INFO StatusLog - Task 0 input : uk.project.Evaluation.Input.Function1(selected=[\"red\",\"yellow\"]){}\n"+
"\n"+
"7114437 2009-07-16 15:43:07,093 [LOGTHREAD] INFO StatusLog - Task 0 output : uk.org.project.Evaluation.Output.Function2(selected=[\"Rocket\"]){}\n"+
"\n"+
"7114422 2009-07-16 15:43:07,078 [LOGTHREAD] INFO StatusLog - Task 0 input : uk.project.Evaluation.Input.Function3(selected=[\"blue\",\"yellow\"]){}\n"+
"\n"+
"7114437 2009-07-16 15:43:07,093 [LOGTHREAD] INFO StatusLog - Task 0 output : uk.org.project.Evaluation.Output.Function4(selected=[\"Speech\"]){}";
ANTLRStringStream in = new ANTLRStringStream(text);
TestLexer lexer = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
for(Object obj : tokens.getTokens()) {
Token token = (Token)obj;
System.out.println("> token.getText() = "+token.getText());
}
}
}
Input
: 'Evaluation.Input.Function' '0'..'9'+ Params
;
Output
: 'Evaluation.Output.Function' '0'..'9'+ Params
;
fragment
Params
: '(selected=[' String ( ',' String )* '])'
;
fragment
String
: '"' ( ~'"' )* '"'
;
Now do:
javac -cp antlr-3.2.jar TestLexer.java
java -cp .:antlr-3.2.jar TestLexer // or on Windows: java -cp .;antlr-3.2.jar TestLexer
and you'll see the following being printed to the console:
> token.getText() = Evaluation.Input.Function1(selected=["red","yellow"])
> token.getText() = Evaluation.Output.Function2(selected=["Rocket"])
> token.getText() = Evaluation.Input.Function3(selected=["blue","yellow"])
> token.getText() = Evaluation.Output.Function4(selected=["Speech"])
精彩评论