String conversion from UTF-8 to UTF-16 Big endian is failing (using C, C++ language)
I am using g_convert() glib function to convert utf-8 string to utf-16 big endian string. The conversion is failing. We are getting an error saying "conversion is not supported"
Could someone give a clue to overcome this issue.
Thanks
Following is the piece of code used to convert string from UTF-8. to UTF16 Bigendian
unsigned short *result_开发者_StackOverflow中文版str;
gsize bytes_read, bytes_written;
gssize len = 0;
GError *error = NULL;
result_str = (unsigned short *)g_convert("text data", len, "UTF-16BE", "UTF-8", &bytes_read, &bytes_written, &error);
You len
is 0. The GLib manual says that len
must be -1 for a NULL-terminated string.
g_convert uses iconv underneath the covers.
On my machine using cygwim I can do
iconv -l
which lists the supported encodings and UTF-16BE does appear in the list however:-
$ iconv -l | grep BE
UCS-2BE UNICODE-1-1 UNICODEBIG CSUNICODE11
UCS-4BE
UTF-16BE
UTF-32BE
James@XPL3KWK28 ~
$ iconv -f UTF-8 -t UTF16-BE
iconv: conversion to UTF16-BE unsupported
iconv: try 'iconv -l' to get the list of supported encodings
as you can see it does not support the conversion to or from UTF-8.
You probably need to do this in two stages UTF-8 to UTF-16 then UTF-16 to UTF-16BE.
I suspect UTF-16BE
is not supported by g_convert
(based on the error message). It's trivial to convert UTF-8 into UTF-16BE though (no tables or other garbage like that) -- you can do that transformation yourself.
You might also want to check if UTF-16
is supported and do your own byte swapping if necessary. But I do not believe g_convert
supports UTF-16
either.
Looks like your system does not support that conversion. (This error means that iconv() returned EINVAL.)
On my Linux system it does appear to be supported:
echo "Hello" | iconv --from-code UTF-16BE --to-code UTF-8
(obviously "Hello" is not a valid UTF-16 string, but it does get converted to something, so the actual conversion seems to be supported)
See if you have UTF-16BE in "iconv --list"
In this particular case your simplest solution might be to just use g_utf8_to_utf16(): http://library.gnome.org/devel/glib/stable/glib-Unicode-Manipulation.html#g-utf8-to-utf16
You can easily do your own byteswap, untested code:
if (G_BYTE_ORDER != G_BIG_ENDIAN) {
for (i = 0; i < len; ++i) {
result_str[i] = GUINT16_TO_BE(result_str[i]);
}
}
精彩评论