Turning a list/sequence of combinator parsers into a single one

2023-04-11 12:41 问答作者：

I have a list of values from which I开发者_如何学JAVA can construct a list of parsers, that depend on these values by mapping (see example). Then what I want to do is turn the list of parsers into a single parser by concatenation.

One possibility is using foldLeft and ~:

parsers.foldLeft(success(Nil)){case (ps,p) => rs ~ p ^^ {case xs ~ x => x ::xs}} ^^ (_.reverse)

Is this efficient?

I don't know how combinator parsers work; will there be a call stack with depth of length of the list? Thus may I run into SO errors for very long concatenations?

Better way

Is there a different way that is more readable?

Example

Suppose you have a file with two lines. The first line contains n integers x_1 to x_n. The second line contains contains x_1 + x_2 + ... x_n integers that belong to groups according to the first line. I want to take the sequence of integers from the first line and create n parsers p_1 to p_n where p_i parses x_i integers.

Suppose I have the list of integers l = List(1,2,3) from the first line. For each integer n I create a parser that parses n integers: parsers = l.map(repN(_,integer)).

What you're describing (and what you've more or less reinvented in your implementation with foldLeft and ~) is essentially Haskell's sequence for monads (really you only need an applicative functor, but that's irrelevant here). sequence takes a list of monadic values and returns a monadic list of values. Parser is a monad, so sequence for Parser would change a List[Parser[A]] into a Parser[List[A]].

Scalaz gives you sequence, but off the top of my head I don't know if there's a nice way to get the necessary Applicative instance for Parser. Fortunately you can roll your own pretty easily (I'm directly translating the Haskell definition):

import scala.util.parsing.combinator._

object parser extends RegexParsers {
  val integer = """\d+""".r

  val counts = List(1, 2, 3)
  val parsers = counts.map(repN(_, integer))

  val line = parsers.foldRight(success(Nil: List[List[String]])) {
    (m, n) => for { x <- m ; xs <- n } yield (x :: xs)
  }

  def apply(s: String) = parseAll(line, s)
}

This gives us List(List(1), List(2, 3), List(4, 5, 6)) for parser("1 2 3 4 5 6"), as desired.

(Note that I'm using RegexParsers here as a convenient complete example, but the approach works more generally.)

What's going on might be a little clearer if we desugar the for comprehension:

val line = parsers.foldRight(success(Nil: List[List[String]])) {
  (current, acc) => current.flatMap(x => acc.map(x :: _))
}

We can write flatMap as into and map as ^^:

val line = parsers.foldRight(success(Nil: List[List[String]])) {
  (current, acc) => current into (x => acc ^^ (x :: _))
}

This isn't too far from your formulation, except that we're using a right fold instead of reversing and aren't building up and breaking down the ~s.

About efficiency: Both of our implementations are going to result in unpleasant call stacks. In my experience this is just a fact of life with Scala's parser combinators. To quote another Stack Overflow answer, for example:

Scala's parser combinators aren't very efficient. They weren't designed to be. They're good for doing small tasks with relatively small inputs.

My sequence-y approach addresses the "more readable" part of your question, and is almost certainly the cleanest way to solve the problem with Scala's parser combinators. It's marginally more efficient than your implementation, and should be fine for a few thousand groups or so. If you need to handle more than that, you'll have to look outside of scala.util.parsing.combinator. I'd recommend something like the following:

def parse(counts: Seq[Int], input: String): Option[Seq[Seq[Int]]] = {
  val parsed = try {
    Some(input.split(" ").map(_.toInt))
  } catch {
    case _ : java.lang.NumberFormatException => None
  }

  parsed.flatMap { ints =>
    if (ints.length != counts.sum) None
    else Some(counts.foldLeft((Seq.empty[Seq[Int]], ints)) {
      case ((collected, remaining), count) => {
        val (m, n) = remaining.splitAt(count)
        (m.toSeq +: collected, n)
      }
    }._1.reverse)
  }
}

No guarantees, but on my system it doesn't overflow on a line with 100k integer groups.

Have you considered using a RegexParsers (in scala.util.parsing.combinator)? Then you can use regular expressions as parsers, which will compute very fast and be easy to write.

For example, if you are using parser combinators to parse an AST for simple arithmatic, you might use regular expressions to interpret tokens that refer to objects so you can parse expressions like appleList.size + 4.

Here is a rather trivial example, but it shows how regular expressions can be combined by parser combinators.

object MyParser extends RegexParsers {
  val regex1 = """[abc]*""".r
  val regex2 = """[def]*""".r
  val parse = regex1 ~ regex2

  def apply(s: String) = parseAll(parse, s)
}

继续阅读：parser-combinators scala

Turning a list/sequence of combinator parsers into a single one

Is this efficient?

Better way

Example

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Is this efficient?

Better way

Example

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？