Get original text of an Antlr rule

2023-04-05 03:35 问答作者：

I am an ANTLR beginner and want to calculate a SHA1-Hash of symbols.

My simplified example grammar:

grammar Example;

method @after{calculateSha1($text); }: 'call' ID;

ID: 'A'..'Z'+;
WS: (' '|'\n'|'\r')+ {skip(); }
COMMENT: '/*' (options {greedy=false;}: .)* '*/' {$channel=HIDDEN}

As the lexer removes all whitespaces the different strings callABC, call /* DEF */ ABC unfortunately get the same SHA1-Hash value.

Is it possible to get the "original" text of a rule between the start- and end-token with all the skipped whitespaces and the text of the other channels?

(One possibility that comes into my mind is to member all characters in the WS- and COMMENT-lexer rule, but there are many more rules, so thi开发者_开发百科s isn't very practical.)

I use the standard ANTLRInputStream to feed the Lexer, but I don't know how to receive the original text.

Instead of skip()-ping the WS token, put it on the HIDDEN channel as well:

grammar Example;

@parser::members {
  void calculateSha1(String text) {
    try {
      java.security.MessageDigest md = java.security.MessageDigest.getInstance("SHA-1");
      byte[] sha1 = md.digest(text.getBytes());
      System.out.println(text + "\n" + java.util.Arrays.toString(sha1) + "\n");
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
}

parse 
  :  method+ EOF
  ;

method
@after{calculateSha1($text);}
  :  'call' ID
  ;

ID      : 'A'..'Z'+;
WS      : (' ' | '\t' | '\n' | '\r')+ {$channel=HIDDEN;};
COMMENT : '/*' .* '*/' {$channel=HIDDEN;};

The grammar above can be tested with:

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String source = "call ABC call /* DEF */ ABC";
    ExampleLexer lexer = new ExampleLexer(new ANTLRStringStream(source));
    ExampleParser parser = new ExampleParser(new CommonTokenStream(lexer));
    parser.parse();
  }
}

which will print the following to the console:

call ABC
[48, -45, 113, 5, -52, -128, -78, 75, -52, -97, -35, 25, -55, 59, -85, 96, -58, 58, -96, 10]

call /* DEF */ ABC
[-57, -2, -115, -104, 77, -37, 4, 93, 116, -123, -47, -4, 33, 42, -68, -95, -43, 91, 94, 77]

i.e.: the same parser rule, yet different $text's (and therefor different SHA1's).

继续阅读：antlr antlr3

Get original text of an Antlr rule

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？