Seeming non-determinism in ANTLR parse

2023-03-12 23:21 问答作者：

If I have an ANTLR g开发者_开发问答rammar as follows:

grammar Test;
options {
  language = Java;
}

rule : (foo | bar);


foo : FOO ',' FOO;   
bar : BAR; 

FOO: ('0'..'9')+;
BAR: ('a'..'z' | 'A'..'Z' | '0'..'9' | ' ')+;
WHITESPACE: (' ' | '\t')+ { $channel=HIDDEN; };

And I use a test string:

12abc3

this (I believe) is a BAR token which satisfies a bar rule and is parsed as such. Bravo.

However, if I have this string:

I receive line 1:2 mismatched input '' expecting ','

This seems rather non-deterministic although I'm sure it's not. I understand I'm already in trouble by having two tokens: FOO and BAR that accept digits. But if the parser is going to succeed or fail it should succeed or fail consistently. In other words, in the first case the first character is a 1 and apparently is being evaluated as a member of the BAR token and thus the parser heads down a successful path. In the second case, the SAME first character is being evaluated as a FOO token and thus the path is doomed to fail despite the fact that the string COULD be a successful bar parse. Why the inconsistency? Or am I missing something more fundamental about ANTLR and/or parsing?

ANTLR doesn't determine the token type until it sees the first character for the next token(or EOF). ANTLR will also attempt a longest match, which is why you see '12abc3' as BAR and not as FOO BAR. In the second case ANTLR will use FOO for '12' because it is listed first in the grammar.

ANTLR basics

ANTLR lexers

In addition to Adam answer, you must realize that the lexer and parser, although defined in the same grammar, are being constructed at different times. First the input source is being tokenized and when that has happened, only then the parser operates on these tokens. The tokens are not created while the parser goes through the source (character stream) to favor a complete match (ie. tokenize "12" as BAR). The fact that "12" is being tokenized as FOO is because FOO comes before the BAR rule and has therefor a higher precedence in case of an equal long match.

In short: ANTLR grammars are not PEG's.

继续阅读：antlr parsing

Seeming non-determinism in ANTLR parse

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？