Regex & String Libraries in Haskell
I'm trying to introduce Haskell into my daily life by using it to write incidental scripts and such.
readProcess
is 开发者_开发技巧handy for getting the results of exterior commands, but I find myself searching when it comes to processing the String results. I'm coming from ruby where regexes are first-class, so I'm used to having them as a tool.
Any libraries I should read up on to do string processing in haskell? Searching for matching lines, pulling out matching regions of a string, and such?
I found this to be a good starting point: http://www.serpentine.com/blog/2007/02/27/a-haskell-regular-expression-tutorial/ It only covers the basics, no advanced topics, but it's great to get started IMHO.
Things to note:
- Regexes in haskell are different in that they have overloaded return types. This means that you can pull many different kinds of thing out of a regex match. (Bool, String, [String], etc...) Depending on the return type you use, it will give you back a different kind of answer (whether or not the regex matched, the test of the match, all matching subgroups, etc..) This is done using some fairly complex typeclass voodoo. The above link demonstrates the basic kinds, a more complete list is here
- There are actually multiple standard modules in haskell that provide regex support (strange but true). The tutorial above shows the POSIX module, because it comes standard in haskell. If you have cabal, you can also pretty easily install other regex modules and use those instead. There's a pcre binding (
regex-pcre
), as well as some packages that work via DFAs (regex-dfa
, among others). Install using a command like:cabal install regex-pcre
and you should be good to go.- (The modules have a standardized interface, the difference is mainly in the implementation and the regex flavor)
- There IS a regex object in haskell, but you don't really need it to use the =~ or =~~ match operators. (Just use a string, conversion happens automatically). If your task is complicated enough that you want a first class parsing object, consider looking into Parsec as has been mentioned in other answers.
DISCLAIMER: I only really user pcre, myself, so I don't really know much about the other packages.
When I was first teaching myself Haskell I found that learning to use a parser combinator library for string processing was a fantastic investment. They can do everything regular expressions can do, and much more, and writing combinator parsers is a great way to build up intuitions about type classes like monads, applicative functors, etc.
I tend to use Attoparsec these days, but Parsec is probably a better starting point because it's more widely documented and discussed, provides nicer error messages, etc.
A good introduction to regular expressions is to be found in Realworld Haskell
Update: On a side note, for command-processing and pipes and such, checkout HSH.
There are plenty of great regex libs in Haskell, but we have better tools. Let's stick with standard Haskell Strings for now (i.e. lists of Char). The basics are all in Data.List -- http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.3.0.0/Data-List.html. You have lines, unlines, words, unwords, takewhile, dropwhile, etc.etc. Also isPrefixOf
and isInfixOf
, etc.
You may end up writing your own recursive functions fairly directly, but that's a breeze too. The only really missing operations are splitting ones, for which you can use brent's excellent package: http://hackage.haskell.org/package/split
Fundamentally, the notion is that you want to do incremental processing of streams of characters.
Not everything is as efficient as possible, especially since the string representation is not that efficient. But if/when you move on to other data types, the core concepts of how you process things will translate directly from basic strings.
精彩评论