Problem with iconv
If you are on Mac OS X 10.6, and you are familiar with character encoding AND the terminal please do this:
开发者_JS百科Open a terminal and type the following commands:
echo sørensen > test.txt iconv -f UTF8 -t ISO-8859-1 test.txt
You will see the output: "sørensen". Can somebody explain what is going on?
UTF-8 is multibyte encoding. Character ø is encoded by two bytes: C3-B8 . In encoding of your terminal (ISO-8859-1) this bytes are decoded as ø . Then you convert those bytes to ISO-8859-1's code of ø. Any questions?
I tried the "iconv" command from one file to another, looking at the data with "od -txC" with the following results:
Input: c3 83 c2 b8 [ 2 utf8-chars Capital A tilde; Cedilla ]
Command: iconv -f utf-8 -t ISO-8859-1 < in.txt > out.txt
Output: c3 b8 [ 2 ISO-8859-1 characters, Capital A tilde; Cedilla ]
So, the iconv conversion is correct.
But, if you instead treat the converted data as utf-8 (which Terminal is apparently doing), C3-B8 is "ø" (o-slash).
If you change your character encoding in Terminal (Preferences // Advanced // Character Encoding) to "Western (ISO Latin 1)" you'll see C3-B8 as "ø"
精彩评论