wprintf UTF16 (should be UTF8) on Linux?
1 It's really strange that wprintf show 'Ω' as 3A9 (UTF16), but wctomb convert wchar to CEA9 (UTF8), my locale is default en_US.utf8. As man-pages said, they should comform to my locale, but wpritnf use UTF16, why?
excerpt from http://www.fileformat.info/info/unicode/char/3a9/index.htm
Ω in UTF
UTF-8 (hex) 0xCE 0xA9 (cea9)
UTF-16 (hex) 0x03A9 (03a9)
2 wprintf and printf just cannot be run in the same program, I have to choose to use either wprintf or printf, why?
See my program:
#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>
#include <locale.h>
int main() {
setlocale(LC_ALL,""); // inherit locale setting开发者_运维问答 from environment
int r;
char wc_char[4] = {0,0,0,0};
wchar_t myChar1 = L'Ω'; //greek
// should comment out either wprintf or printf, they don't run together
r = wprintf(L"char is %lc (%x)\n", myChar1, myChar1);//On Linux, to UTF16
r = wctomb(wc_char, myChar1); // On Linux, to UTF8
r = printf("r:%d, %x, %x, %x, %x\n", r, wc_char[0], wc_char[1], wc_char[2], wc_char[3]);
}
The answer to your second question has to do with stream orientation. You cannot mix printf()
and wprintf()
because they require different orientations.
When the process starts, the streams are not set yet. On the first call to a function that uses the stream, it gets set accordingly. printf()
will set the orientation to normal, and wprintf()
will set it to wide.
It is undefined behavior to call a function that requires a different orientation as the current setting.
How exactly are you determining what the wprintf
line is printing? Your comment below the question seems to imply that you're just examining the results of wprintf ("%x", myChar1);
, which prints the internal numeric value of myChar1
regardless of character encoding (but not regardless of character set — there's a difference); assuming that your compiler uses Unicode for wchar_t
s internally (a pretty safe bet, I believe), this simply prints out the Unicode codepoint for 'Ω', which is 0x3a9, independently of UTF-16 vs. UTF-8 distinctions. In order to tell whether wprintf
is printing UTF-16, you have to directly examine the raw bytes that are output (e.g., with hexdump(1)
). For example, on my computer, the wprintf
line prints the following:
63 68 61 72 20 69 73 20 ce a9 20 28 33 61 39 29 0a
c h a r i s Ω ( 3 a 9 ) \n
Note that the omega is encoded in UTF-8 as the bytes CE A9, but the numeric value of the wchar_t
is still 3A9.
Ahh, I may have found it. You need to execute
setlocale(LC_ALL, "")
first. It looks like the wchar I/O functions are not honoring the LC_ environment variables.
See http://littletux.homelinux.org/knowhow.php?article=charsets/ar01s08 for more background.
精彩评论