Haskell: Escaped character from character

2023-01-12 17:23 问答作者：

I'm writing a parsec parser which reads in strings and converts escaped characters, as part of exercise 3 here.

For that exercise I am using this function:

escapedCharFromChar :: Char -> Char
escapedCharFromChar c = read $ c开发者_如何学Pythononcat ["'\\",[c],"'"]

I am not to impressed with the use of read to convert the character x into the escape character with the name x. Can anyone suggest a more elegant function of type Char -> Char to do this?

One way is to lay out the cases exhaustively:

charFromEscape :: Char -> Char
charFromEscape 'n' = '\n'
charFromEscape 't' = '\t'
--- ... --- Help!

You could also use lookup:

-- this import goes at the top of your source file
import Data.Maybe (fromJust)

charFromEscape :: Char -> Char
charFromEscape c = fromJust $ lookup c escapes
  where escapes = [('n', '\n'), ('t', '\t')] -- and so on

The fromJust bit may look strange. The type of lookup is

lookup :: (Eq a) => a -> [(a, b)] -> Maybe b

which means for a value of some type over which equality is defined and a lookup table, it wants to give you the corresponding value from the lookup table—but your key isn't guaranteed to be present in the table! That's the purpose of Maybe, whose definition is

data Maybe a = Just a | Nothing

With fromJust, it assumes you got Just something (i.e., c has an entry in escapes), but this will fall apart when that assumption is invalid:

ghci> charFromEscape 'r'
*** Exception: Maybe.fromJust: Nothing

These examples will move you along in the exercise, but it's clear that you'd like better error handling. Also, if you expect the lookup table to be large, you may want to look at Data.Map.

read (or rather, Text.Read.Lex.lexCharE) is how you get at GHC's internal table, which is defined as:

 lexEscChar =
   do c <- get
      case c of
        'a'  -> return '\a'
        'b'  -> return '\b'
        'f'  -> return '\f'
        'n'  -> return '\n'
        'r'  -> return '\r'
        't'  -> return '\t'
        'v'  -> return '\v'
        '\\' -> return '\\'
        '\"' -> return '\"'
        '\'' -> return '\''
        _    -> pfail

Eventually, you have to define the semantics somewhere. You can do it in your program, or you can reuse GHC's.

I just used pattern matching for the few escapes I cared about - i.e. 't' -> '\t'etc. The solution other readers suggested were similar. Not very generic, but very straight-forward.

You should consider implementing a proper Parser Char function instead of a Char -> Char. (Or, if you're doing that anyway, consider using a Char -> Maybe Char instead.)

The Char -> Char approach only works if escape sequences consist of only a backslash and a single other character. Some languages have more complex character escapes that consist of a longer sequence of characters. For example, C++ supports multi-character escape sequences such as \u005C (which represents the unicode code point U+005C).

parseEscapeSequence :: Parser Char
parseEscapeSequence = do
    c <- get
    case c of
        '\' -> return '\\'
        '0' -> Just '\0'
        't' -> return '\t'
        'f' -> return '\f'
        'r' -> return '\r'
        'n' -> return '\n'
        -- ...
        'u' -> parseUnicodeEscape4
        'U' -> parseUnicodeEscape8
        -- ...
        _ -> fail "Unrecognised escape sequence"

Whereby parseUnicodeEscape4 and parseUnicodeEscape8 would each parse a fixed number of hexadecimal digits and convert them into a unicode character, likely by first converting the digits into integers in the 0..15 range, then combining those 'nibbles' into a larger integer, and then converting that integer into a unicode character.

You could alternatively offload the simple escape sequences to another function that does pattern matching, but that function should ideally have a type of Char -> Maybe Char, to allow for proper error reporting.

parseEscapeSequence :: Parser Char
parseEscapeSequence = do
    c <- get
    case c of
        -- ...
        'u' -> parseUnicodeEscape4
        'U' -> parseUnicodeEscape8
        -- ...
        _ ->
            case maybeCharFromEscape c of
                Just result -> return result
                Nothing -> fail "Unrecognised escape sequence"
        
maybeCharFromEscape :: Char -> Maybe Char
maybeCharFromEscape c =
    case c of
        '\' -> Just '\\'
        '0' -> Just '\0'
        't' -> Just '\t'
        'f' -> Just '\f'
        'r' -> Just '\r'
        'n' -> Just '\n'
        _ -> Nothing

For the maybeCharFromEscape function you could alternatively implement it via lookup or a Map (as other answers point out), but even then you're still going to have to explicitly write down all the possibilities and the result might be less efficient (though you may find it more readable).

继续阅读：char escaping haskell

Haskell: Escaped character from character

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？