开发者

How to combine Regexp and keywords in Scala parser combinators

I've seen two approaches to building parsers in Scala.

The first is to extends from RegexParsers and define your won lexical patterns. The issue I see with this is that I don't really understand how it deals with keyword ambiguities. For example, if my keyword match the same pattern as idents, then it processes the keywords as idents.

To counter that, I've seen posts like this one that show how to use the StandardTokenParsers to specify keywords. But then, I don't understand how to s开发者_StackOverflow社区pecify the regexp patterns! Yes, StandardTokenParsers comes with "ident" but it doesn't come with the other ones I need (complex floating point number representations, specific string literal patterns and rules for escaping, etc).

How do you get both the ability to specify keywords and the ability to specify token patterns with regular expressions?


I've written only RegexParsers-derived parsers, but what I do is something like this:

val name: Parser[String] = "[A-Z_a-z][A-Z_a-z0-9]*".r

val kwIf: Parser[String]    = "if\\b".r
val kwFor: Parser[String]   = "for\\b".r
val kwWhile: Parser[String] = "while\\b".r

val reserved: Parser[String] = ( kwIf | kwFor | kwWhile )

val identifier: Parser[String] = not(reserved) ~> name


Similar to the answer from @randall-schulz, but use an explicit negative lookahead in the regular expression itself.

Here, empty is a keyword but empty? should be an identifier. The negative lookahead fails the match (without consuming the characters) if empty is followed by anything in nameCharsRE. The kw helper function is used for multiple such keywords:

  val nameCharsRE = "[^\\s\",'`()\\[\\]{}|;#]"

  private def kw(kw: String, token: Token) = positioned {
    (s"${kw}(?!${nameCharsRE})").r ^^ { _ => token }
  }
  private def empty        = kw("empty", EMPTY_KW())
  private def and          = kw("and", AND())
  private def or           = kw("or", OR())
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜