Implementing Read typeclass where parsing strings includes "$"

2023-04-05 10:19 问答作者：

I've been playing with Haskell for about a month. For my first "real" Haskell project I'm writing a parts-of-speech tagger. As part of this project I have a type called Tag that represents a parts-of-speech tag, implemented as follows:

data Tag = CC | CD | DT | EX | FW | IN | JJ | JJR | JJS ...

The above is a long list of standardized parts-of-speech tags which I've intentionally truncated. However, in this standard set of tags there are two that end in a dollar sign ($): PRP$ and NNP$. Because I can't have type constructors with $ in their name, I've elected to rename them PRPS and NNPS.

This is all well and good, but I'd like to read tags from strings in a lexicon and convert them to my Tag type. Trying this fails:

instanc开发者_C百科e Read Tag where
    readsPrec _ input =
        (\inp -> [((NNPS), rest) | ("NNP$", rest) <- lex inp]) input

The Haskell lexer chokes on the $. Any ideas how to pull this off?

Implementing Show was fairly straightforward. It would be great if there were some similar strategy for Read.

instance Show Tag where
    showsPrec _ NNPS = showString "NNP$"
    showsPrec _ PRPS = showString "PRP$"
    showsPrec _ tag  = shows tag

You're abusing Read here.

Show and Read are meant to print and parse valid Haskell values, to enable debugging, etc. This doesn't always perfectly (e.g. if you import Data.Map qualified and then call show on a Map value, the call to fromList isn't qualified) but it's a valid starting point.

If you want to print or parse your values to match some specific format, then use a pretty-printing library for the former and an actual parsing library (e.g. uu-parsinglib, polyparse, parsec, etc.) for the latter. They typically have much nicer support for parsing than ReadS (though ReadP in GHC isn't too bad).

Whilst you may argue that this isn't necessary, this is just a quick'n'dirty hack you're doing, quick'n'dirty hacks have a tendency to linger around... do yourself a favour and do it right the first time: it means there's less to re-write when you want to do it "properly" later on.

Don't use the Haskell lexer then. The read functions use ParSec, which you can find an excellent introduction to in the Real World Haskell book.

Here's some code that seems to work,

import Text.Read
import Text.ParserCombinators.ReadP hiding (choice)
import Text.ParserCombinators.ReadPrec hiding (choice)

data Tag = CC | CD | DT | EX | FW | IN | JJ | JJR | JJS deriving (Show)

strValMap = map (\(x, y) -> lift $ string x >> return y)

instance Read Tag where
    readPrec = choice $ strValMap [
        ("CC", CC),
        ("CD", CD),
        ("JJ$", JJS)
        ]

just run it with

(read "JJ$") :: Tag

The code is pretty self explanatory. The string x parser monad matches x, and if it succeeds (doesn't throw an exception), then y is returned. We use choice to select among all of these. It will backtrack appropriately, so if you add a CCC constructor, then CC partially matching "CCC" will fail later, and it will backtrack to CCC. Of course, if you don't need this, then use the <|> combinator.

继续阅读：haskell linguistics

Implementing Read typeclass where parsing strings includes "$"

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？