Haskell: Escaped character from character
I'm writing a parsec parser which reads in strings and converts escaped characters, as part of exercise 3 here.
For that exercise I am using this function:
escapedCharFromChar :: Char -> Char
escapedCharFromChar c = read $ c开发者_如何学Pythononcat ["'\\",[c],"'"]
I am not to impressed with the use of read
to convert the character x
into the escape character with the name x
. Can anyone suggest a more elegant function of type Char -> Char
to do this?
One way is to lay out the cases exhaustively:
charFromEscape :: Char -> Char
charFromEscape 'n' = '\n'
charFromEscape 't' = '\t'
--- ... --- Help!
You could also use lookup
:
-- this import goes at the top of your source file
import Data.Maybe (fromJust)
charFromEscape :: Char -> Char
charFromEscape c = fromJust $ lookup c escapes
where escapes = [('n', '\n'), ('t', '\t')] -- and so on
The fromJust
bit may look strange. The type of lookup
is
lookup :: (Eq a) => a -> [(a, b)] -> Maybe b
which means for a value of some type over which equality is defined and a lookup table, it wants to give you the corresponding value from the lookup table—but your key isn't guaranteed to be present in the table! That's the purpose of Maybe
, whose definition is
data Maybe a = Just a | Nothing
With fromJust
, it assumes you got Just something
(i.e., c
has an entry in escapes
), but this will fall apart when that assumption is invalid:
ghci> charFromEscape 'r' *** Exception: Maybe.fromJust: Nothing
These examples will move you along in the exercise, but it's clear that you'd like better error handling. Also, if you expect the lookup table to be large, you may want to look at Data.Map.
read
(or rather, Text.Read.Lex.lexCharE
) is how you get at GHC's internal table, which is defined as:
lexEscChar =
do c <- get
case c of
'a' -> return '\a'
'b' -> return '\b'
'f' -> return '\f'
'n' -> return '\n'
'r' -> return '\r'
't' -> return '\t'
'v' -> return '\v'
'\\' -> return '\\'
'\"' -> return '\"'
'\'' -> return '\''
_ -> pfail
Eventually, you have to define the semantics somewhere. You can do it in your program, or you can reuse GHC's.
I just used pattern matching for the few escapes I cared about - i.e. 't' -> '\t'
etc. The solution other readers suggested were similar. Not very generic, but very straight-forward.
You should consider implementing a proper Parser Char
function instead of a Char -> Char
.
(Or, if you're doing that anyway, consider using a Char -> Maybe Char
instead.)
The Char -> Char
approach only works if escape sequences consist of only a backslash and a single other character. Some languages have more complex character escapes that consist of a longer sequence of characters. For example, C++ supports multi-character escape sequences such as \u005C
(which represents the unicode code point U+005C).
parseEscapeSequence :: Parser Char
parseEscapeSequence = do
c <- get
case c of
'\' -> return '\\'
'0' -> Just '\0'
't' -> return '\t'
'f' -> return '\f'
'r' -> return '\r'
'n' -> return '\n'
-- ...
'u' -> parseUnicodeEscape4
'U' -> parseUnicodeEscape8
-- ...
_ -> fail "Unrecognised escape sequence"
Whereby parseUnicodeEscape4
and parseUnicodeEscape8
would each parse a fixed number of hexadecimal digits and convert them into a unicode character, likely by first converting the digits into integers in the 0..15
range, then combining those 'nibbles' into a larger integer, and then converting that integer into a unicode character.
You could alternatively offload the simple escape sequences to another function that does pattern matching, but that function should ideally have a type of Char -> Maybe Char
, to allow for proper error reporting.
parseEscapeSequence :: Parser Char
parseEscapeSequence = do
c <- get
case c of
-- ...
'u' -> parseUnicodeEscape4
'U' -> parseUnicodeEscape8
-- ...
_ ->
case maybeCharFromEscape c of
Just result -> return result
Nothing -> fail "Unrecognised escape sequence"
maybeCharFromEscape :: Char -> Maybe Char
maybeCharFromEscape c =
case c of
'\' -> Just '\\'
'0' -> Just '\0'
't' -> Just '\t'
'f' -> Just '\f'
'r' -> Just '\r'
'n' -> Just '\n'
_ -> Nothing
For the maybeCharFromEscape
function you could alternatively implement it via lookup
or a Map
(as other answers point out), but even then you're still going to have to explicitly write down all the possibilities and the result might be less efficient (though you may find it more readable).
精彩评论