开发者

Unfathomable problem with unicode and frameworks

I'm experien开发者_开发问答cing a very strange problem... The following trivial test code works as it should if it is injected in a single Cocoa application, but when I use it in one of my frameworks, I get absolutely unexpected results...

wchar_t Buf[2048];
wcscpy(Buf, L"/zbxbxklbvasyfiogkhgfdbxbx/bxkfiorjhsdfohdf/xbxasdoipppwejngfd/gjfdhjgfdfdjkg.sdfsdsrtlrt.ljlg/fghlfg");
int len1 = wcslen(L"/zbxbxklbvasyfiogkhgfdbxbx/bxkfiorjhsdfohdf/xbxasdoipppwejngfd/gjfdhjgfdfdjkg.sdfsdsrtlrt.ljlg/fghlfg");
int len2 = wcslen(Buf);

char Buf2[2048];
Buf2[0]=0;
wcstombs(Buf2, Buf, 2048);

// ??? Buf2 == ""
// ??? len1 == len2 == 57, but should be 101

How can this be, have I gone mad? Even if there was a memory corruption, it couldn't possibly corrupt all these values allocated on stack... Why won't even the wcslen(L"MyWideString") work? Changing test string changes its length, but it is always wrong, wcstombs returns -1...

setlocale() is not used anywhere, test string contains only ASCII characters, in order to ease porting I use the -fshort-wchar compiler option, but it works fine in case of a test Cocoa application...

Please help!


I've just tested this again with GCC 4.6. In the standard settings, this works as expected, giving 101 for all the lengths. However, with your option -fshort-wchar I also get unexpected results (51 in my case, and 251 for the final conversion after using setlocale()).

So I looked up the man entry for the option:

Warning: the -fshort-wchar switch causes GCC to generate code that is not binary compatible with code generated without that switch. Use it to conform to a non-default application binary interface.

I think that explains it: When you're linking to the standard library, you are expected to use the correct ABI and type conventions, which you are overriding with that option.


Wide char implementation in C/C++ can be anything, including 1 byte, 2 bytes or 4 bytes. This depends on the compiler and the platform you are compiling to.

Probably wikipedia is not the best place to quote from but in this case: http://en.wikipedia.org/wiki/Wide_character states that

... width of wchar_t is compiler-specific and can be as small as 8 bits.

and

... wide characters should be 16-bit values under C90 due to historical compatibility reasons. C and C++ compilers that comply with the 10646-1:2000 Unicode standard generally assume 32-bit values....

So, do not assume and use the sizeof(wchar_t).


-fshort-wchar change the compiler's ABI, so you need to recompile glibc, libgcc and all library using wchar_t. Otherwise, wcslen and other functions in glibc are still assume wchar_t is 4 bytes.

see: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42092

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜