Parser stop mid-parse
I am completely out of ideas. I spend every free minute this day on this, but I am completely out of ideas.
This is my Ocamlyacc
grammar:
input: /* empty */ { }
| input stmt { }
stmt:
extern { print_endline "Got an extern import" }
| func { print_endline "Got function definition" }
| call { print_endline "Got function call" }
extern:
EXTERN proto { Extern $2 }
func:
DEF proto expr { Function ($2, $3) }
proto:
IDENTIFIER LPAREN id_list RPAREN { print_endline "Got prototype definition"; Prototype ($1, $3) }
id_list:
/* empty */ { [] }
| IDENTIFIER { [$1] }
| id_list COMMA IDENTIFIER { $3 :: $1 }
expr_list:
/* empty */ { [] }
| expr { [$1] }
| expr_list COMMA expr { $3 :: $1 }
expr:
call { $1 }
| expr OP expr { Binary ($2, $1, $3) }
| IDENTIFIER { Variable $1 }
| NUMBER { Number $1 }
| LPAREN expr RPAREN { $2 }
call:
IDENTIFIER LPAREN expr_list RPAREN { Call ($1, $3) }
When I start parsing def foo(a,b) a+b
it should tell me it got a function and a prototype declaration, according to debug messages. But instead, I only get the message on parsing the proto
rule.
Further debug messages show that the parser comes as far as to the a
of the expression a+b
and then stops. No error message, nothing else. It just stops as if the entire text hat been parsed completely without meeting any of the rules in stmt
.
There are no shift/reduce error or similar. The AST types are also not the problem. I have no idea any more, maybe someone else can help. Surely it is something obvious but I cannot see it.
EDIT: Lexer by popular demand:
{
open Parser
}
rule token = parse
| [' ' '\t' '\n'] { token lexbuf }
| "def" { DEF }
| "extern" { EXTERN }
| "if" { IF }
| "then" { THEN }
| "else" { ELSE }
| ['+' '-' '*' '/'] as c { OP c }
| ['A'-'Z' 'a'-'z'] ['A'-'Z' 'a'-'z' '0'-'9' '_']* as id { IDENTIFIER id }
| ['0'-'9']*'.'['0'-'9']+ as num { NUMBER (float_of_string num) }
| '(' { LPAREN }
| ')' { RPAREN }
| ',' { COMMA }
| '#' { comment lexbuf }
| _ { raise Parsing.Parse_error }
| eof { raise End_of_file }
and开发者_StackOverflow comment = parse
| '\n' { token lexbuf }
| _ { comment lexbuf }
First point: I hated you a bit for not giving a compilable source code. I had to reinvent the AST types, the %token
declarations etc. to test your code.
The problem is a delicate interplay between the
| eof { raise End_of_file }
lexing rule, and your grammar.
Raising Enf_of_file
on EOF in the lexer is a good idea if your grammar never naturally encounters the end of the file. For example, grammars that are naturally \n
-terminated or ;;
-terminated will stop parsing at this point, and never get to the EOF token.
But your grammar isn't one of those. When the parser gets to DEF proto expr .
, it asks for the next token to see if it weren't, by chance, and OP
, so it calls the lexer, which finds EOF
, and blows.
Here is my fix:
In lex.mll:
| eof { EOF }
In parse.mly: %token EOF
%start stmt_eof
%type <Types.stmt> stmt_eof
[...]
stmt_eof: stmt EOF { $1 }
Finally, you should seriously consider Menhir as a replacement for ocamlyacc. It does everything ocamlyacc
does, only better, with clearer grammar files (eg. you wouldn't have to reinvent the foo_list
nonterminal each time), better error messages, debugging features...
精彩评论