Convert Between Latin1-encoded Data.ByteString and Data.Text
Since the latin-1 (aka I开发者_如何学编程SO-8859-1) character set is embedded in the Unicode character set as its lowest 256 code-points, I'd expect the conversion to be trivial, but I didn't see any latin-1 encoding conversion functions in Data.Text.Encoding
which contains only conversion functions for the common UTF encodings.
What's the recommended and/or efficient way to convert between Data.ByteString
values encoded in latin-1 representation and Data.Text
values?
The answer is right at the top of the page you linked:
To gain access to a much larger family of encodings, use the
text-icu
package: http://hackage.haskell.org/package/text-icu
A quick GHCi example:
λ> import Data.Text.ICU.Convert
λ> conv <- open "ISO-8859-1" Nothing
λ> Data.Text.IO.putStrLn $ toUnicode conv $ Data.ByteString.pack [198, 216, 197]
ÆØÅ
λ> Data.ByteString.unpack $ fromUnicode conv $ Data.Text.pack "ÆØÅ"
[198,216,197]
However, as you pointed out, in the specific case of latin-1, the code points coincide with Unicode, so you can use pack
/unpack
from Data.ByteString.Char8
to perform the trivial mapping from latin-1 from/to String
, which you can then convert to Text
using the corresponding pack
/unpack
from Data.Text
.
精彩评论