putwchar / getwchar encoding?
I'm writing code which runs on both Windows and Linux. The application works with unicode strings, and I'm looking to output them to the console using common code.
Will putwchar and getwchar do the trick? 开发者_如何转开发 For example, can I provide unicode character values to these functions, and they will both display the same character on Linux and Windows?
You are about to enter a world of pain. Invariably *nix consoles prefer you to send them UTF-8 encoded char* data.
Windows on the other hand uses UTF-16 for its Unicode APIs and for console APIs I believe it is limited to UCS2.
You need probably need to find some library code that abstracts away the differences for you. I don't have a good recommendation for you but I am sure that putwchar
and getwchar
are not the solution.
One of the many ways to reconcile them is to use explicit conversion modes in Windows:
#ifdef _WIN32
#include <fcntl.h>
#include <io.h>
#endif
#include <wchar.h>
#include <stdio.h>
#include <locale.h>
int main()
{
#ifdef _WIN32
_setmode(_fileno(stdout), _O_WTEXT);
#else
setlocale(LC_ALL, "en_US.UTF-8");
#endif
fputws(L"Кошка\n", stdout);
}
tested with gcc 4.6.1 on Linux and Visual Studio 2010 on windows
There's also a _O_U8TEXT
and _O_U16TEXT
in Windows. Your mileage may vary.
See the putwchar
man page on Linux. It says that the behavior depends on LC_CTYPE
and says "It is reasonable to expect that putwchar() will actually write the multibyte sequence corresponding to the wide character wc." Similarly, getwchar()
should read a multibyte sequence from standard input, and return it as a wide character.
Don't assume that they will read/write a constant number of bytes like they would in UCS2.
All that said, character-by-character I/O isn't usually the fastest solution, and when you start optimizing, do keep in mind that on Linux and Unix you'll be working in UTF-8.
精彩评论