How can I cin and cout some unicode text?
I ask a code snippet which cin a unicode text, concatenates another unicode one to the first unicode text and the cout the result.
P.S. This code will help me to solve another bigger problem with unicod开发者_JAVA技巧e. But before the key thing is to accomplish what I ask.
ADDED: BTW I can't write in the command line any unicode symbol when I run the executable file. How I should do that?
I had a similar problem in the past, in my case imbue
and sync_with_stdio
did the trick. Try this:
#include <iostream>
#include <locale>
#include <string>
using namespace std;
int main() {
ios_base::sync_with_stdio(false);
wcin.imbue(locale("en_US.UTF-8"));
wcout.imbue(locale("en_US.UTF-8"));
wstring s;
wstring t(L" la Polynésie française");
wcin >> s;
wcout << s << t << endl;
return 0;
}
Depending on what type unicode you mean. I assume you mean you are just working with std::wstring
though. In that case use std::wcin
and std::wcout
.
For conversion between encodings you can use your OS functions like for Win32: WideCharToMultiByte
, MultiByteToWideChar
or you can use a library like libiconv
Here is an example that shows four different methods, of which only the third (C conio
) and the fourth (native Windows API) work (but only if stdin/stdout aren't redirected). Note that you still need a font that contains the character you want to show (Lucida Console supports at least Greek and Cyrillic). Note that everything here is completely non-portable, there is just no portable way to input/output Unicode strings on the terminal.
#ifndef UNICODE
#define UNICODE
#endif
#ifndef _UNICODE
#define _UNICODE
#endif
#define STRICT
#define NOMINMAX
#define WIN32_LEAN_AND_MEAN
#include <iostream>
#include <string>
#include <cstdlib>
#include <cstdio>
#include <conio.h>
#include <windows.h>
void testIostream();
void testStdio();
void testConio();
void testWindows();
int wmain() {
testIostream();
testStdio();
testConio();
testWindows();
std::system("pause");
}
void testIostream() {
std::wstring first, second;
std::getline(std::wcin, first);
if (!std::wcin.good()) return;
std::getline(std::wcin, second);
if (!std::wcin.good()) return;
std::wcout << first << second << std::endl;
}
void testStdio() {
wchar_t buffer[0x1000];
if (!_getws_s(buffer)) return;
const std::wstring first = buffer;
if (!_getws_s(buffer)) return;
const std::wstring second = buffer;
const std::wstring result = first + second;
_putws(result.c_str());
}
void testConio() {
wchar_t buffer[0x1000];
std::size_t numRead = 0;
if (_cgetws_s(buffer, &numRead)) return;
const std::wstring first(buffer, numRead);
if (_cgetws_s(buffer, &numRead)) return;
const std::wstring second(buffer, numRead);
const std::wstring result = first + second + L'\n';
_cputws(result.c_str());
}
void testWindows() {
const HANDLE stdIn = GetStdHandle(STD_INPUT_HANDLE);
WCHAR buffer[0x1000];
DWORD numRead = 0;
if (!ReadConsoleW(stdIn, buffer, sizeof buffer, &numRead, NULL)) return;
const std::wstring first(buffer, numRead - 2);
if (!ReadConsoleW(stdIn, buffer, sizeof buffer, &numRead, NULL)) return;
const std::wstring second(buffer, numRead);
const std::wstring result = first + second;
const HANDLE stdOut = GetStdHandle(STD_OUTPUT_HANDLE);
DWORD numWritten = 0;
WriteConsoleW(stdOut, result.c_str(), result.size(), &numWritten, NULL);
}
- Edit 1: I've added a method based on
conio
. - Edit 2: I've messed around with
_O_U16TEXT
a bit as described in Michael Kaplan's blog, but that seemingly only hadwgets
interpret the (8-bit) data fromReadFile
as UTF-16. I'll investigate this a bit further during the weekend.
If you have actual text (i.e., a string of logical characters), then insert to the wide streams instead. The wide streams will automatically encode your characters to match the bits expected by the locale encoding. (And if you have encoded bits instead, the streams will decode the bits, then re-encode them to match the locale.)
There is a lesser solution if you KNOW you have UTF-encoded bits (i.e., an array of bits intended to be decoded into a string of logical characters) AND you KNOW the target of the output stream is expecting that very same bit-format, then you can skip the decoding and re-encoding steps and write() the bits as-is. This only works when you know both sides use the same encoding format, which may be the case for small utilities not intended to communicate with processes in other locales.
It depends on the OS. If your OS understands you can simply send it UTF-8 sequences.
精彩评论