开发者

c++ convert from UTF-8 to wstring using iconv

开发者_运维技巧I have a c++ linux application which runs the following:

int main()
{
  using namespace std;
  char str[] = "¡Hola!";

  wchar_t wstr[50];

  size_t rc;

  memset(wstr, 0, sizeof(wstr));

  rc = mbstowcs(wstr, str, 50);

  cout << "mbstowcs results: ";
  cout << "rc = " << rc << endl;
  cout << "str:" << str  << endl;
  wcout << L"wstr:" << wstr  << endl;
  setlocale(LC_CTYPE,"");
  iconv_t cd = iconv_open("WCHAR_T", "UTF-8");
  cout << "iconv_open errno = "<< errno << endl;

  char *s = str;
  char *t = (char *)wstr;
  size_t s1 = strlen(str);
  size_t s2 = 50;

  rc = iconv(cd, &s, &s1, &t, &s2);

  cout << "iconv results: ";
  cout << "rc = " << rc << endl;
  cout << "str:" << str  << endl;
  wcout << L"wstr:" << wstr  << endl;

}

I want to convert a UTF-8 char vector to wstring, but the above code return this result:

 mbstowcs results: rc = 18446744073709551615
    str:¡Hola!
    wstr:
    iconv_open errno = 2
    iconv results: rc = 0
    str:¡Hola!
    wstr:�Hola!

iconv result convert the first char to another char.

Note: if I replace the WCHAR_T in UCS-4 -INTERNAL the wstr contains nothing.

any help?

thanks!


Is it possible to use boost?

http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/codecvt.html


Without looking at the iconv documentation (never had to use it so far) I'd expect your input (char str[] = "¡Hola!";) not being encoded as a multibyte string - it's more likely a simple ANSI string using your local/current codepage to represent the '¡'. Or in other words: In your existing string (using const char[]) '¡' is stored in a single byte with a value somewhere above 127. mbstowcs() however would expect it to use possibly 2 bytes to represent a proper '¡' (didn't check this for now) and the value your '¡' uses might even be something not expected/allowed.

I'd expect the error to happen there as mbcstowcs() should return the number of characters in the converted string - but "18446744073709551615" is simply too long. If this is true, you should also be able to use iconv properly when defining your own wide string with the proper text and using that one instead (wchar_t wstr[] = L"¡Hola!";).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜