开发者

With Parsec, how do I parse zero or more foo1 terminated by foo2 and all separated by dot?

What I am trying to do seems pretty simple, but since I am a parsec Haskell newb, the solution is eluding me.

I have two parsers, let's say foo1 and foo2 where foo1 can parse a intermedate term and foo2 parses an ending term. Terms are separated by开发者_StackOverflow中文版 a symbol, ".".

Sentences that I need to parse are

  • foo2
  • foo1.foo2
  • foo1.foo1.foo2

and so on.

My original thought was to do

do k <- sepBy foo1 (char'.')
   j <- foo2

but that wouldn't catch the foo2-only case.


You want endBy, not sepBy.

foo = do k <- foo1 `endBy` char '.'
         j <- foo2
         ... 

That will force the separator to be present after each occurrence of foo1.

Of course, endBy is trivially replaceable by many, which may be clearer.

foo = do k <- many $ foo1 <* char '.' 
         j <- foo2
         ...

or, without Control.Applicative:

foo = do k <- many $ do x <- foo1; char '.'; return x
         j <- foo2
         ...


First, you want endBy instead of sepBy:

do k <- endBy foo1 (char'.')
   j <- foo2

Second, it would

catch the just foo2 case

From the documentation:

endBy p sep parses zero or more occurrences of p, separated by sep. Returns a list of values returned by p.


Try something like

many (foo1 >>= (\v -> char '.' >> return v)) >>= \v1 ->
  foo2 >>= \v2 ->
  -- ...
  -- combine v1 & v2 somehow

(Just a sketch, of course.)

In general, the many combinator is Parsec's equivalent of Kleene star; and if you're going to add something simple like a trailing dot to an existing parser, using >> / >>= may actually be cleaner and simpler than using do notation.


sure, it would catch the foo2 case. Using for your foo1, Leiden's word:

let a = sepBy word (char '.')
parseTest a "foo.bar.baz"
parseTest a "foo"
parseTest a ".baz"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜