开发者

Nested brackets/chars '(' and ')' in grammar/ANTLRWorks warning: Decision can match input such as ... using multiple alternatives

The grammar below parses ( left part = right part # comment ), # comment is optional.

Two questions:

  1. Sometimes warning (ANTLRWorks 1.4.2): Decision can match input such as "{Int, Word}" using multiple alternatives: 1, 2 (referencing id2)

    But only sometimes!

  2. The next extension should be that the comment (id2) can contain chars '(' and ')'.

The grammar:

grammar NestedBrackets1a1;

//==========================================================
// Lexer Rules
//==========================================================

Int
  :  Digit+
  ;

fragment Digit
  :  '0'..'9'
  ;

Special
  : ( TCUnderscore | TCQuote )
  ;
TCListStart   : '(' ; 
TCListEnd     : ')' ;   
fragment TCUnderscore  : '_' ;
fragment TCQuote       : '"' ;

// A word must start with a letter
Word
  :  ( 'a'..'z' | 'A'..'Z' | Special ) ('a'.开发者_开发技巧.'z' | 'A'..'Z' | Special | Digit )*
  ;

Space
  :  ( ' ' | '\t' | '\r' | '\n' ) { $channel = HIDDEN; }
  ;

//==========================================================
// Parser Rules
//==========================================================

assignment
  :  TCListStart id1 '=' id1 ( comment )? TCListEnd
  ;

id1
  :  Word+
  ;

comment
  : '#' ( id2 )*
  ;

id2
  :  ( Word | Int )+
  ;


ANTLRStarter wrote:

Sometimes warning (ANTLRWorks 1.4.2): Decision can match input such as "{Int, Word}" using multiple alternatives: 1, 2 (referencing id2) But only sometimes!

No, the grammar you posted will always produce this warning. Perhaps you don't always notice it (your IDE-plugin or ANTLRWorks might show it in a tab you don't have opened), but the warning is there. Convince yourself by creating a lexer/parser from the command line:

java -cp antlr-3.4-complete.jar org.antlr.Tool NestedBrackets1a1.g

will produce:

warning(200): NestedBrackets1a1.g:49:19:
Decision can match input such as "{Int, Word}" using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input

This is because you have a * after ( id2 ) inside your comment rule, and id2 also is a repetition of tokens: ( Word | Int )+. Let's say your input is "# foo bar" (a # followed by two Word tokens). ANTLR can now parse the input in more than 1 way: the 2 tokens "foo" and "bar" could be matched by ( id2 )*, where id2 matches a single Word token at a time, but "foo" and "bar" could also be matches in one go of the id2 rule.

Look at the merged rules:

comment
  :  '#' ( ( Word | Int )+ )*
  ;

See how you're repeating a repetition: ( ( ... )+ )*? This is usually a problem, as it is in your case.

Resolve this problem by either replacing the * with a ?:

comment
  :  '#' ( id2 )?
  ;

id2
  :  ( Word | Int )+
  ;

or by removing the +:

comment
  :  '#' ( id2 )*
  ;

id2
  :  ( Word | Int )
  ;

ANTLRStarter wrote:

The next extension should be that the comment (id2) can contain chars '(' and ')'.

That is asking for trouble since a comment is followed by a TCListEnd, which is a ). I don't recommend letting a comment match ).

EDIT

Note that comments are usually stripped from the source file while tokenizing the input source. That way you don't need to account for them in your parser rules. You can do that by "skipping" these tokens in a lexer rule:

Comment
  :  '#' ~('\r' | '\n')* {skip();}
  ; 
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜