开发者

Extracting recursion in ANTLR

I've got a grammar in ANTLR and don't understand how it can be recursive. Is there any way to get ANTLR to show the derivation that it used to see that my rules are recursive?

The recursive grammar in it's entirety:

grammar DeadMG;

options {
    language = C;
}

ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

INT :   '0'..'9'+
    ;

FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
    |   '.' ('0'..'9')+ EXPONENT?
    |   ('0'..'9')+ EXPONENT
    ;

COMMENT
    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;

WS  :   ( ' '
        | '\t'
        | '\r'
        | '\n'
        ) {$channel=HI开发者_JAVA百科DDEN;}
    ;

STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;

CHAR:  '\'' ( ESC_SEQ | ~('\''|'\\') ) '\''
    ;

fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;

fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;

fragment
ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;

program 
    : namespace_scope_definitions;

namespace_scope_definitions
    : (namespace_definition | type_definition | function_definition | variable_definition)+;

type_scope_definitions
    : (type_definition | function_definition | variable_definition)*;   

namespace_definition
    : 'namespace' ID '{' namespace_scope_definitions '}';

type_definition 
    : 'type' ID? (':' expression (',' expression)+ )? '{' type_scope_definitions '}';

function_definition
    : ID '(' function_argument_list ')' ('(' function_argument_list ')')? ('->' expression)? compound_statement;

function_argument_list
    : expression? ID (':=' expression)? (',' function_argument_list)?;

variable_definition
    : 'static'? expression? ID ':=' expression 
    | 'static'? expression ID ('(' (expression)* ')')?;

literal_expression
    : CHAR
    | FLOAT
    | INT
    | STRING
    | 'auto'
    | 'type'
    | type_definition;

primary_expression
    : literal_expression
    | ID
    | '(' expression ')';

expression 
    : assignment_expression;

assignment_expression
    : logical_or_expression (('=' | '*=' | '/=' | '%=' | '+=' | '-=' | '<<='| '>>=' | '&=' | '^=' | '|=') assignment_expression)*;

logical_or_expression
    : logical_and_expression ('||' logical_and_expression)*;

logical_and_expression
    : inclusive_or_expression ('&&' inclusive_or_expression)*;

inclusive_or_expression
    : exclusive_or_expression ('|' exclusive_or_expression)*;

exclusive_or_expression
    : and_expression ('^' and_expression)*;

and_expression
    : equality_expression ('&' equality_expression)*;

equality_expression
    : relational_expression (('=='|'!=') relational_expression)*;

relational_expression
    : shift_expression (('<'|'>'|'<='|'>=') shift_expression)*;

shift_expression
    : additive_expression (('<<'|'>>') additive_expression)*;

additive_expression
    : multiplicative_expression (('+' multiplicative_expression) | ('-' multiplicative_expression))*;

multiplicative_expression
    : unary_expression (('*' | '/' | '%') unary_expression)*;

unary_expression
    : '++' primary_expression
    | '--' primary_expression
    | ('&' | '*' | '+' | '-' | '~' | '!') primary_expression
    | 'sizeof' primary_expression
    | postfix_expression;

postfix_expression
    : primary_expression
    | '[' expression ']'
    | '(' expression* ')'
    | '.' ID
    | '->' ID
    | '++' 
    | '--';

initializer_statement
    : expression ';'
    | variable_definition ';';

return_statement
    : 'return' expression ';';

try_statement
    : 'try' compound_statement catch_statement;

catch_statement
    : 'catch' '(' ID ')' compound_statement catch_statement?
    | 'catch' '(' '...' ')' compound_statement;

for_statement
    : 'for' '(' initializer_statement expression? ';' expression? ')' compound_statement;

while_statement
    : 'while' '(' initializer_statement ')' compound_statement;

do_while_statement
    : 'do' compound_statement 'while' '(' expression ')';

switch_statement
    : 'switch' '(' expression ')' '{' case_statement '}';

case_statement
    : 'case:' (statement)* case_statement?
    | 'default:' (statement)*;

if_statement
    : 'if' '(' initializer_statement ')' compound_statement;

statement
    : compound_statement
    | return_statement
    | try_statement
    | initializer_statement
    | for_statement
    | while_statement
    | do_while_statement
    | switch_statement
    | if_statement;

compound_statement
    : '{' (statement)* '}';

More specifically, I am having trouble with the following rules:

namespace_scope_definitions
    : (namespace_definition | type_definition | function_definition | variable_definition)+;

type_scope_definitions
    : (type_definition | function_definition | variable_definition)*;   

ANTLR is saying that alternatives 2 and 4, that is, type_definition and variable_definition, are recursive. Here's variable_definition:

variable_definition
: 'static'? expression? ID ':=' expression 
| 'static'? expression ID ('(' (expression)* ')')?;

and here's type_definition:

type_definition 
: 'type' ID? (':' expression (',' expression)+ )? '{' type_scope_definitions '}';

'type' itself, and type_definition, is a valid expression in my expression syntax. However, removing it is not resolving the ambiguity, so it doesn't originate there. And I have plenty of other ambiguities I need to resolve- detailing all the warnings and errors would be quite too much, so I'd really like to see more details on how they are recursive from ANTLR itself.


My suggestion is to remove most of the operator precedence rules for now:

expression 
    : multiplicative_expression
      (
        ('+' multiplicative_expression)
      | ('-' multiplicative_expression)
      )*;

and then inline the rules that have a single caller to isolate the ambiguities. Yes it is tedious.


I found a few ambiguities in the grammar, fixed them and got a lot less warnings. However, I think that probably, LL is just not the right parsing algorithm for me. I am writing a custom parser and lexer. It would still have been nice if ANTLR would show me how it found the problems though, so that I might intervene and fix them.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜