开发者

How to use utf8 character arrays in c++?

Is it possible to have char *s to work with utf8 encoding in C++ (VC2010)?

For example if my source file is saved in utf8 and I write something like this:

const char* c = "aäáéöő开发者_运维技巧";

Is this possible to make it utf-8 encoded? And if yes, how is it possible to use

char* c2 = new char[strlen("aäáéöő")];

for dynamic allocation if characters can be variable length?


The encoding for narrow character string literals is implementation defined, so you'd really have to read the documentation (if you can find it). A quick experiment shows that both VC++ (VC8, anyway) and g++ (4.4.2, anyway) actually just copy the bytes from the source file; the string literal will be in whatever encoding your editor saved it in. (This is clearly in violation of the standard, but it seems to be common practice.)

C++11 has UTF-8 string literals, which would allow you to write u8"text", and be ensured that "text" was encoded in UTF-8. But I don't really expect it to work reliably: the problem is that in order to do this, the compiler has to know what encoding your source file has. In all probability, compiler writers will continue to ignore the issue, just copying the bytes from the source file, and achieve conformance simply be documenting that the source file must be in UTF-8 for these features to work.


If the text you want to put in the string is in your source code, make sure your source code file is in UTF-8.

If that don't work, try maybe using \u1234 with 1234 being a code point value.

You can also try to use UTF8-CPP maybe.

Take a look at this answer : Using Unicode in C++ source code


See this MSDN article which talks about converting between string types (that should give you examples on how to use them). The strings types that are covered include char *, wchar_t*, _bstr_t, CComBSTR, CString, basic_string, and System.String:

How to: Convert Between Various String Types


There is a hotfix for VisualStudio 2010 SP1 which can help: http://support.microsoft.com/kb/980263.

The hotfix adds a pragma to override visual studio's control the character encoding for the char type:

#pragma execution_character_set("utf-8")

Without the pragma, char* based literals are typically interpreted as the default code page (typically 1252)

This should all be superseded eventually by new string literal prefix modifiers specified by C++0x (u8, u, and U for utf-8, utf-16, and utf-32 respectively), which ideally will be supprted in the next major version of Visual Studio after 2010.


It is possible, save the file in UTF-8 without BOM signature encoding.

//Save As UTF8 without BOM signature
#include<stdio.h>
#include<windows.h>
int main(){
    SetConsoleOutputCP(65001);
    char *c1 = "aäáéöő";
    char *c2 = new char[strlen("aäáéöő")];
    strcpy(c2,c1);
    printf("%s\n",c1);
    printf("%s\n",c2);
}

Result:

 D:\Debug>program
aäáéöő
aäáéöő

The result of redirection program is really UTF8 encoded file.

How to use utf8 character arrays in c++?


This is compiler - independent answer (compile on Windows).
(A similar question.)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜