Accessing Scala Parser regular expression match data

2022-12-12 15:44 问答作者：

I wondering if it's possible to get the MatchData generated from the matching regular expression in the grammar below.

object DateParser extends JavaTokenParsers {

    ....

    val dateLiteral = """(\d{4}[-/])?(\d\d[-/])?(\d\d)""".r ^^ {
        ... get MatchData
    }
}

One option of course is to perform the match again inside the block, but since the RegexParser has already performed the match I'm hopi开发者_StackOverflow社区ng that it passes the MatchData to the block, or stores it?

Here is the implicit definition that converts your Regex into a Parser:

  /** A parser that matches a regex string */
  implicit def regex(r: Regex): Parser[String] = new Parser[String] {
    def apply(in: Input) = {
      val source = in.source
      val offset = in.offset
      val start = handleWhiteSpace(source, offset)
      (r findPrefixMatchOf (source.subSequence(start, source.length))) match {
        case Some(matched) =>
          Success(source.subSequence(start, start + matched.end).toString, 
                  in.drop(start + matched.end - offset))
        case None =>
          Failure("string matching regex `"+r+"' expected but `"+in.first+"' found", in.drop(start - offset))
      }
    }
  }

Just adapt it:

object X extends RegexParsers {
  /** A parser that matches a regex string and returns the Match */
  def regexMatch(r: Regex): Parser[Regex.Match] = new Parser[Regex.Match] {
    def apply(in: Input) = {
      val source = in.source
      val offset = in.offset
      val start = handleWhiteSpace(source, offset)
      (r findPrefixMatchOf (source.subSequence(start, source.length))) match {
        case Some(matched) =>
          Success(matched,
                  in.drop(start + matched.end - offset))
        case None =>
          Failure("string matching regex `"+r+"' expected but `"+in.first+"' found", in.drop(start - offset))
      }
    }
  }
  val t = regexMatch("""(\d\d)/(\d\d)/(\d\d\d\d)""".r) ^^ { case m => (m.group(1), m.group(2), m.group(3)) }
}

Example:

scala> X.parseAll(X.t, "23/03/1971")
res8: X.ParseResult[(String, String, String)] = [1.11] parsed: (23,03,1971)

No, you can't do this. If you look at the definition of the Parser used when you convert a regex to a Parser, it throws away all context and just returns the full matched string:

http://lampsvn.epfl.ch/trac/scala/browser/scala/tags/R_2_7_7_final/src/library/scala/util/parsing/combinator/RegexParsers.scala?view=markup#L55

You have a couple of other options, though:

break up your parser into several smaller parsers (for the tokens you actually want to extract)
define a custom parser that extracts the values you want and returns a domain object instead of a string

The first would look like

val separator = "-" | "/"
  val year = ("""\d{4}"""r) <~ separator
  val month = ("""\d\d"""r) <~ separator
  val day = """\d\d"""r

  val date = ((year?) ~ (month?) ~ day) map {
    case year ~ month ~ day =>
      (year.getOrElse("2009"), month.getOrElse("11"), day)
  }

The <~ means "require these two tokens together, but only give me the result of the first one.

The ~ means "require these two tokens together and tie them together in a pattern-matchable ~ object.

The ? means that the parser is optional and will return an Option.

The .getOrElse bit provides a default value for when the parser didn't define a value.

When a Regex is used in a RegexParsers instance, the implicit def regex(Regex): Parser[String] in RegexParsers is used to appoly that Regex to the input. The Match instance yielded upon successful application of the RE at the current input is used to construct a Success in the regex() method, but only its "end" value is used, so any captured sub-matches are discarded by the time that method returns.

As it stands (in the 2.7 source I looked at), you're out of luck, I believe.

I ran into a similar issue using scala 2.8.1 and trying to parse input of the form "name:value" using the RegexParsers class:

package scalucene.query

import scala.util.matching.Regex
import scala.util.parsing.combinator._

object QueryParser extends RegexParsers {
  override def skipWhitespace = false

  private def quoted = regex(new Regex("\"[^\"]+"))
  private def colon = regex(new Regex(":"))
  private def word = regex(new Regex("\\w+"))
  private def fielded = (regex(new Regex("[^:]+")) <~ colon) ~ word
  private def term = (fielded | word | quoted)

  def parseItem(str: String) = parse(term, str)
}

It seems that you can grab the matched groups after parsing like this:

QueryParser.parseItem("nameExample:valueExample") match {
  case QueryParser.Success(result:scala.util.parsing.combinator.Parsers$$tilde, _) => {
      println("Name: " + result.productElement(0) + " value: " + result.productElement(1))
  }
}

继续阅读：parser-combinators scala

Accessing Scala Parser regular expression match data

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？