Parsing Newlines, EOF as End-of-Statement Marker with ANTLR3

2023-02-26 01:20 问答作者：

My question is in regards to running the following grammar in ANTLRWorks:

INT :('0'..'9')+;
SEMICOLON: ';';
NEWLINE: ('\r\n'|'\n'|'\r');
STMTEND: (SEMICOLON (NEWLINE)*|NEWLINE+);

statement
    : STMTEND
    | INT STMTEND
    ;

program: statement+;

I get the following results with the following input (with program as the start rule), regardless of which newline NL (CR/LF/CRLF) or integer I choose:

"; NL" or "32; NL" parses without error. ";" or "45;" (without newlines) result in EarlyExitException. "NL" by itself parses without error. "456 NL", without the semicolon, results in MismatchedTokenException.

What I want is for a statement to be terminated by a newline, semicolon, or semicolon followed by newline, and I want the parser to eat as many contiguous newlines as it can on a termination, so "; NL NL NL NL" is just one termination, not four or five. Also, I would like the end-of-file case to be a valid termination as well, but I don't know how to do that yet.

So what's wrong with this, and how can I make this terminate nicely at EOF? I'm completely new to all of parsing, ANTLR, and EBNF, and I haven't found much material to read on it at a level somewhere in between the simple calculator example and the reference (I have The Definitive ANTLR Reference, but it really is a reference, with a quick start in the front which I haven't yet got to run outside of ANTLRWorks), so any reading su开发者_如何转开发ggestions (besides Wirth's 1977 ACM paper) would be helpful too. Thanks!

In case of input like ";" or "45;", the token STMTEND will never be created.

";" will create a single token: SEMICOLON, and "45;" will produce: INT SEMICOLON.

What you (probably) want is that SEMICOLON and NEWLINE never make it to real tokens themselves, but they will always be a STMTEND. You can do that by making them so called "fragment" rules:

program: statement+;

statement
 : STMTEND
 | INT STMTEND
 ;

INT     : '0'..'9'+;
STMTEND : SEMICOLON NEWLINE* | NEWLINE+;

fragment SEMICOLON : ';';
fragment NEWLINE   : '\r' '\n' | '\n' | '\r';

Fragment rules are only available for other lexer rules, so they will never end up in parser (production) rules. To emphasize: the grammar above will only ever create either INT or STMTEND tokens.

继续阅读：antlr antlr3 antlrworks

Parsing Newlines, EOF as End-of-Statement Marker with ANTLR3

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？