Get original text of an Antlr rule
I am an ANTLR beginner and want to calculate a SHA1-Hash of symbols.
My simplified example grammar:
grammar Example;
method @after{calculateSha1($text); }: 'call' ID;
ID: 'A'..'Z'+;
WS: (' '|'\n'|'\r')+ {skip(); }
COMMENT: '/*' (options {greedy=false;}: .)* '*/' {$channel=HIDDEN}
As the lexer removes all whitespaces the different strings callABC
, call /* DEF */ ABC
unfortunately get the same SHA1-Hash value.
Is it possible to get the "original" text of a rule between the start- and end-token with all the skipped whitespaces and the text of the other channels?
(One possibility that comes into my mind is to member all characters in the WS
- and COMMENT
-lexer rule, but there are many more rules, so thi开发者_开发百科s isn't very practical.)
I use the standard ANTLRInputStream to feed the Lexer, but I don't know how to receive the original text.
Instead of skip()
-ping the WS
token, put it on the HIDDEN
channel as well:
grammar Example;
@parser::members {
void calculateSha1(String text) {
try {
java.security.MessageDigest md = java.security.MessageDigest.getInstance("SHA-1");
byte[] sha1 = md.digest(text.getBytes());
System.out.println(text + "\n" + java.util.Arrays.toString(sha1) + "\n");
} catch(Exception e) {
e.printStackTrace();
}
}
}
parse
: method+ EOF
;
method
@after{calculateSha1($text);}
: 'call' ID
;
ID : 'A'..'Z'+;
WS : (' ' | '\t' | '\n' | '\r')+ {$channel=HIDDEN;};
COMMENT : '/*' .* '*/' {$channel=HIDDEN;};
The grammar above can be tested with:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String source = "call ABC call /* DEF */ ABC";
ExampleLexer lexer = new ExampleLexer(new ANTLRStringStream(source));
ExampleParser parser = new ExampleParser(new CommonTokenStream(lexer));
parser.parse();
}
}
which will print the following to the console:
call ABC [48, -45, 113, 5, -52, -128, -78, 75, -52, -97, -35, 25, -55, 59, -85, 96, -58, 58, -96, 10] call /* DEF */ ABC [-57, -2, -115, -104, 77, -37, 4, 93, 116, -123, -47, -4, 33, 42, -68, -95, -43, 91, 94, 77]
i.e.: the same parser rule, yet different $text
's (and therefor different SHA1's).
精彩评论