What is the best way to convert String to ByteString
What is the best way to convert a String to a ByteString in Haskell?
My gut reaction to the problem is
import qualified Data.ByteString as B
import Data.Char (ord)
pac开发者_开发百科kStr = B.pack . map (fromIntegral . ord)
But this doesn't seem satisfactory.
Here is my cheat sheet for Haskell String/Text/ByteString strict/lazy conversion assuming the desired encoding is UTF-8. The Data.Text.Encoding library has other encodings available.
Please make sure to not write (using OverloadedStrings):
lazyByteString :: BL.ByteString
lazyByteString = "lazyByteString ä ß" -- BAD!
This will get encoded in an unexpected way. Try
lazyByteString = BLU.fromString "lazyByteString ä ß" -- good
instead.
String literals of type 'Text' work fine with regard to encoding.
Cheat sheet:
import Data.ByteString.Lazy as BL
import Data.ByteString as BS
import Data.Text as TS
import Data.Text.Lazy as TL
import Data.ByteString.Lazy.UTF8 as BLU -- from utf8-string
import Data.ByteString.UTF8 as BSU -- from utf8-string
import Data.Text.Encoding as TSE
import Data.Text.Lazy.Encoding as TLE
-- String <-> ByteString
BLU.toString :: BL.ByteString -> String
BLU.fromString :: String -> BL.ByteString
BSU.toString :: BS.ByteString -> String
BSU.fromString :: String -> BS.ByteString
-- String <-> Text
TL.unpack :: TL.Text -> String
TL.pack :: String -> TL.Text
TS.unpack :: TS.Text -> String
TS.pack :: String -> TS.Text
-- ByteString <-> Text
TLE.encodeUtf8 :: TL.Text -> BL.ByteString
TLE.decodeUtf8 :: BL.ByteString -> TL.Text
TSE.encodeUtf8 :: TS.Text -> BS.ByteString
TSE.decodeUtf8 :: BS.ByteString -> TS.Text
-- Lazy <-> Strict
BL.fromStrict :: BS.ByteString -> BL.ByteString
BL.toStrict :: BL.ByteString -> BS.ByteString
TL.fromStrict :: TS.Text -> TL.Text
TL.toStrict :: TL.Text -> TS.Text
Please +1 Peaker's answer, because he correctly deals with encoding.
Data.ByteString.UTF8.fromString
is also useful. The Char8
version will lose the unicode-ness and UTF8 will make a UTF8-encoded ByteString
. You have to choose one or the other.
A safe approach will involve encoding the unicode string:
import qualified Data.ByteString as B
import qualified Data.Text as T
import Data.Text.Encoding (encodeUtf8)
packStr'' :: String -> B.ByteString
packStr'' = encodeUtf8 . T.pack
Regarding the other answers: Data.ByteString.Char8.pack is effectively the same as the version in the question, and is unlikely to be what you want:
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as C
import qualified Data.Text as T
import Data.Text.Encoding (encodeUtf8)
import Data.Char (ord)
packStr, packStr', packStr'' :: String -> B.ByteString
packStr = B.pack . map (fromIntegral . ord)
packStr' = C.pack
packStr'' = encodeUtf8 . T.pack
*Main> packStr "hellö♥"
"hell\246e"
*Main> packStr' "hellö♥"
"hell\246e"
*Main> packStr'' "hellö♥"
"hell\195\182\226\153\165"
Data.ByteString.UTF8.fromString is fine, but requires the utf8-string package, while Data.Text.Encoding comes with the Haskell Platform.
精彩评论