开发者

Which Haskell parsing technology is most pleasant to use, and why? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.

Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.

Closed 8 years ago.

Improve this question

"Pleasant" meaning, for example: you can write grammars in a "natural" way without having to rewrite them in a convoluted way, and without having to introduce boring boilerplate.

Let's stipulate for the purposes of this question that, unless the performance of a technology is pathologically bad, performance isn't the biggest issue here.

Although, having said that, you might want to mention if a technology falls down when it comes to having to rewrite a grammar for performance reasons.

Please give me an idea of the size and complexity of grammars you have worked with, when answering this question. Also, whether you have used any notable "advanced" features of th开发者_开发技巧e technology in question, and what your impressions of those were.

Of course, the answer to this question may depend on the domain, in which case, I'd be happy to learn this fact.


It really depends what you start with and what you want to do. There isn't a one size fits all.

If have an LR grammar (e.g. you are working from a Yacc grammar), it is a good deal of work to turn it into an LL one suitable for Parsec or uu-parsinglib. However the many, sepBy etc. parsers are very helpful here, but you should expect the parser to be slower than Happy+Alex.

For LL combinator parsing, uu-parsinglib and it predecessor uu-parsing are nice but they are lacking something like Parsec's Token and Language modules so are perhaps less convenient. Some people like Malcolm Wallace's Parselib because they have a different model to Parsec for backtracking but I've no experience of them.

If you are decoding some formatted file rather than something like a programming language, Attoparsec or similar might be better than Parsec or uu-parsinglib. Better in this context being faster - not just ByteString vs. Char, but I think Attoparsec does less work regarding error handling / source location tracking so the parsers should run faster as they are doing less work per input element.

Also, bear in mind that text file formats might not always have grammars as such, so you might have to define some custom combinators to do special lexical tricks rather than just define "parser combinators" for each element.

For LR parsing, I found Ralf Hinze's Frown to be nicer than Happy - better error support and a nicer format for grammar files but Frown is not actively maintained and isn't on Hackage. I think it is LR(k) rather LR(1) which means it is more powerful w.r.t. lookahead.

Performance is not really a big concern w.r.t. a grammar. Programming languages have complex grammars, but you can expect fairly small files. As for data file formats it really behoves the designer of the format to design it in such a way that it allows efficient parsing. For combinator parsers you shouldn't need many advanced features for a data format file - if you do, either the format is badly designed (this sometimes happens unfortunately) or your parser is.

For the record I've written a C parser with Frown, GL-shading language with Happy, an unfinished C parser with UU_Parsing, and many things with Parsec. The choice for me was what I start with, LR grammar - Frown or Happy (now Happy as Frown isn't maintained), otherwise usually Parsec (as I said uu_parse is nice but lacks the convenience of LanguageDef). For binary formats I roll my own, but I usually have special requirements.


Recently, I recast a DSL parser in uu-parsinglib which had been written in parsec. I found that it greatly simplified the program. My main motivation was to get the auto-correcting aspect. That just works. It's practically free! Also, I much preferred writing my parser in an applicative style as opposed to the monadic style of Parsec.


We've had great success using 'uu-parsinglib' - we've switched to that from Parsec as it's quite a bit more flexible and powerful - eg it can support lazy parsing if needed, and you don't need to explicitly use a combinator (like 'try' in Parsec) to mark possible back-tracking points.

It is true that at present you need to do a little more on the tokenizing side of things, but for us that's a small point relative to the fundamental strengths of the library.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜