returning meaningful error messages from a parser written with Scala Parser Combinators
I try to write a parser in scala using Parser Combinators. If I match recursively,
def body: Parser[Body] =
("begin" ~> statementList ) ^^ {
case s => { new Body(s); }
}
def statementList : Parser[List[Statement]] =
("end" ^^ { _ => List() } )|
(statement ~ statementList ^^ { case statement ~ statementList => statement :: statementList })
then I get good errormessages whenever there is a fault in a statement. However, this is ugly long code. So I'd like to write this:
def body: Parser[Body] =
("begin" ~> statementList <~ "end" ) ^^ {
case s => { new Body(s); }
}
def statementList : Parser[List[Statement]] =
rep(statement)
This code works, but only prints meaningful messages if there is an error in the FIRST statement. If it is in a later statement, the message becomes painfully unusable, because the parser wants to see the whole erroneous statement replaced by the "end" token:
Exception in thread "main" java.lang.RuntimeException: [4.2] error: "end" expected but "let" found
let b : string = x(3,b,"WHAT???",!ERRORHERE!开发者_如何学Go,7 )
^
My question: is there a way to get rep and repsep working in combination with meaningful error messages, that place the caret on the right place instead of on the begin of the repeating fragment?
You can do it by combining a "home made" rep
method with non-backtracking inside statements. For example:
scala> object X extends RegexParsers {
| def myrep[T](p: => Parser[T]): Parser[List[T]] = p ~! myrep(p) ^^ { case x ~ xs => x :: xs } | success(List())
| def t1 = "this" ~ "is" ~ "war"
| def t2 = "this" ~! "is" ~ "war"
| def t3 = "begin" ~ rep(t1) ~ "end"
| def t4 = "begin" ~ myrep(t2) ~ "end"
| }
defined module X
scala> X.parse(X.t4, "begin this is war this is hell end")
res13: X.ParseResult[X.~[X.~[String,List[X.~[X.~[String,String],String]]],String]] =
[1.27] error: `war' expected but ` ' found
begin this is war this is hell end
^
scala> X.parse(X.t3, "begin this is war this is hell end")
res14: X.ParseResult[X.~[X.~[String,List[X.~[X.~[String,String],String]]],String]] =
[1.19] failure: `end' expected but ` ' found
begin this is war this is hell end
^
Ah, found the solution! It turns out that you need to use the function phrase on your main parser to return a new parser that is less inclined to track back. (I wonder what it exactly means, perhaps that if it finds a line break it will not track back?) tracks the last position on wich an failure occured.
changed:
def parseCode(code: String): Program = {
program(new lexical.Scanner(code)) match {
case Success(program, _) => program
case x: Failure => throw new RuntimeException(x.toString())
case x: Error => throw new RuntimeException(x.toString())
}
}
def program : Parser[Program] ...
into:
def parseCode(code: String): Program = {
phrase(program)(new lexical.Scanner(code)) match {
case Success(program, _) => program
case x: Failure => throw new RuntimeException(x.toString())
case x: Error => throw new RuntimeException(x.toString())
}
}
def program : Parser[Program] ...
精彩评论