With Parsec, how do I parse zero or more foo1 terminated by foo2 and all separated by dot?
What I am trying to do seems pretty simple, but since I am a parsec Haskell newb, the solution is eluding me.
I have two parsers, let's say foo1
and foo2
where foo1
can parse a intermedate term and foo2
parses an ending term. Terms are separated by开发者_StackOverflow中文版 a symbol, "."
.
Sentences that I need to parse are
foo2
foo1.foo2
foo1.foo1.foo2
and so on.
My original thought was to do
do k <- sepBy foo1 (char'.')
j <- foo2
but that wouldn't catch the foo2
-only case.
You want endBy
, not sepBy
.
foo = do k <- foo1 `endBy` char '.'
j <- foo2
...
That will force the separator to be present after each occurrence of foo1
.
Of course, endBy
is trivially replaceable by many
, which may be clearer.
foo = do k <- many $ foo1 <* char '.'
j <- foo2
...
or, without Control.Applicative
:
foo = do k <- many $ do x <- foo1; char '.'; return x
j <- foo2
...
First, you want endBy
instead of sepBy
:
do k <- endBy foo1 (char'.')
j <- foo2
Second, it would
catch the just foo2 case
From the documentation:
endBy p sep
parses zero or more occurrences ofp
, separated bysep
. Returns a list of values returned byp
.
Try something like
many (foo1 >>= (\v -> char '.' >> return v)) >>= \v1 ->
foo2 >>= \v2 ->
-- ...
-- combine v1 & v2 somehow
(Just a sketch, of course.)
In general, the many
combinator is Parsec's equivalent of Kleene star; and if you're going to add something simple like a trailing dot to an existing parser, using >>
/ >>=
may actually be cleaner and simpler than using do
notation.
sure, it would catch the foo2 case. Using for your foo1, Leiden's word:
let a = sepBy word (char '.')
parseTest a "foo.bar.baz"
parseTest a "foo"
parseTest a ".baz"
精彩评论