开发者

haskell: read in special characters from console

I'd like to read in a string from console which contains special characters like ö,ä,ü,µ... I've tried:

do ... ts <- getLine ...

but this doesn't work for those character. For example, unicode for ö is \246, but if I use getLine to read in ö haskell reads in "\195\182", and putStr "\195\182" gives me ö, which is not ö. What's the problem here? Do I need another function to read in those characters?

I am using WinGHCi 7.0.3 on windows xp. I'd be glad if someone could help me because I didn't find anything so far.


@Judah Jacobson:

I tried it again, before typing any other commands, and got this:

Prelude> :m +System.IO
Prelude System.IO> hSetEncoding stdin utf8
Prelude System.IO> getLine
ασδφ
"\206\177\207\402\206\180\207\8224"
Prelude System.IO> putStr "\206\177\207\402\206\180\207\8224"
ασδφPrelude System.IO> 开发者_JAVA百科

I also tried the windows command chcp 65001 but it didn't change anything, I had utf8 already activated in windows.


Since GHC 6.12 strings are handled as UTF8 in input and output (or with some other encoding, based on your locale setting). So make sure your locale is set to e.g. UTF8.

You can also manually control this stuff via the text package, which supports many other locale conventions and encodings.


You need to set the encoding of stdin to UTF8. For me, this is set to CP437 initially in GHCi on Windows XP, and to UTF8 on Mac.

Check with hGetEncoding stdin (System.IO), and set with hSetEncoding stdin utf8 and it should work.

Edit: This is what it looks like on my Mac:

Prelude System.IO> hSetEncoding stdin latin1
Prelude System.IO> str <- getLine
ö
Prelude System.IO> putStr str
öPrelude System.IO> print str
"\195\182"
Prelude System.IO> hSetEncoding stdin utf8
Prelude System.IO> str <- getLine
ö
Prelude System.IO> putStr str
öPrelude System.IO> print str
"\246"


I was able to reproduce your error; this looks like a bug in WinGHCi. By default, GHC on Windows uses the Win32 "console code page" to encode and decode Handle I/O. However, WinGHCi sends input to GHC as UTF8-encoded bytes, but incorrectly has the code page set to 1252 (Latin-1).

I was able to work around this bug using Mike Hartl's answer: run hSetEncoding stdin utf8 before performing any line-input commands. For example:

Prelude> :m +System.IO
Prelude System.IO> hSetEncoding stdin utf8
Prelude System.IO> getLine
ασδφ
"\945\963\948\966"

If that doesn't work for you, please let us know what you get when you run the above commands.

Alternately, you will probably have better luck Unicode-wise with the "GHCi" program (which, admittedly, has a less nice GUI).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜