开发者

C++: output contents of a Unicode file to console in Windows

I've read a bunch of articles and forums posts discussing this problem all of the solutions seem way too complicated for such a simple task.

Here's a sample code straight from cplusplus.com:

// reading a text file
#include <iostream>
#include <fstream>
#include <string>
using开发者_JAVA百科 namespace std;

int main () {
  string line;
  ifstream myfile ("example.txt");
  if (myfile.is_open())
  {
    while ( myfile.good() )
    {
      getline (myfile,line);
      cout << line << endl;
    }
    myfile.close();
  }

  else cout << "Unable to open file"; 

  return 0;
}

It works fine as long as example.txt has only ASCII characters. Things get messy if I try to add, say, something in Russian.

In GNU/Linux it's as simple as saving the file as UTF-8.

In Windows, that doesn't work. Converting the file into UCS-2 Little Endian (what Windows seems to use by default) and changing all the functions into their wchar_t counterparts doesn't do the trick either.

Isn't there some kind of a "correct" way to get this done without doing all kinds of magic encoding conversions?


The Windows console supports unicode, sort of. It does not support left-to-right and "complex scripts". To print a UTF-16 file with Visual C++, use the following:

   _setmode(_fileno(stdout), _O_U16TEXT);   

And use wcout instead of cout.

There is no support for a "UTF8" code page so for UTF-8 you will have to use MultiBytetoWideChar

More on console support for unicode can be found in this blog


The right way to output to a console on Windows using cout is to first call GetConsoleOutputCP, and then convert the input you have into the console code page. Alternatively, use WriteConsoleW, passing a wchar_t*.


For reading UTF-8 or UTF-16 strings from a file, you can use the extended mode string of _wfopen_s and fgetws. I don't think there is a C++ interface for these extensions yet. The easiest way to print to the console is described in Michael Kaplan's blog:

#include <fcntl.h>
#include <io.h>
#include <stdio.h>

int main(void) {
    _setmode(_fileno(stdout), _O_U16TEXT);
    wprintf(L"\x043a\x043e\x0448\x043a\x0430 \x65e5\x672c\x56fd\n");
    return 0;
}

Avoid GetConsoleOutputCP, it is only retained for compatibility with the 8-bit API.


While Windows console windows are UCS-2 based, they don't support UTF-8 properly.

You might make things work by setting the console window's active output code page to UTF-8 temporarily, using the appropriate API functions. Note that those functions distinguish between input code page and output code page. However, [cmd.exe] really doesn't like UTF-8 as active code page, so don't set that as a permanent code page.

Otherwise, you can use the Unicode console window functions.

Cheers & hth.,


#include <stdio.h>

int main (int argc, char *argv[])
{
    // do chcp 65001 in the console before running this
    printf ("γασσο γεο!\n");
}

Works perfectly if you do chcp 65001 in the console before running your program.

Caveats:

  • I'm using 64 bit Windows 7 with VC++ Express 2010
  • The code is in a file encoded as UTF-8 without BOM - I wrote it in a text editor, not using the VC++ IDE, then used VC++ to compile it.
  • The console has a TrueType font - this is important

Don't know if these things make too much difference...

Can't speak for chars off the BMP, give it a whirl and leave a comment.


Just to be clear, some here have mentioned UTF8. UTF8 is a multibyte format, which in some documentation is mistakenly referred to as Unicode. Unicode is always just two bytes.

I've used this previously posted solution with Visual Studio 2008. I don't know if if works with later versions of Visual Studio.

   #include <iostream>
   #include <fnctl.h>
   #include <io.h>
   #include <tchar.h>

   <code ommitted>


   _setmode(_fileno(stdout), _O_U16TEXT); 

   std::wcout << _T("This is some text to print\n");

I used macros to switch between std::wcout and std::cout, and also to remove the _setmode call for ASCII builds, thus allowing compiling either for ASCII and UNICODE. This works. I have not yet tested using std::endl, but I that might work wcout and Unicode (not sure), i.e.

   std::wcout << _T("This is some text to print") << std::endl;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜