NSString unicode encoding problem
I'm having problems converting the string 开发者_Python百科to something readable . I'm using
NSString *substring = [NSString stringWithUTF8String:[symbol.data cStringUsingEncoding:NSUTF8StringEncoding]];
but I can't convert \U7ab6\U51b1 into '
It shows as 窶冱 which is what I don't want, it should show as an '. Can anyone help me?
it is shown as a ’
That's character U+2019 RIGHT SINGLE QUOTATION MARK.
What has happened is you've had the character sequence ’s
submitted to you, in the UTF-8 encoding, which comes out as bytes:
’ s
E2 80 99 73
That byte sequence has then, incorrectly, been interpreted as if it were encoded in Windows code page 932 (Japanese; more or less Shift-JIS):
E2 80 99 73
窶 冱
So in this one particular case, you could recover the ’s
string by firstly encoding the characters into cp932 bytes, and then decoding those bytes back to characters using UTF-8.
However, this will not solve your real problem, which is that the strings were read in incorrectly in the first place. You got 窶冱
in this case because the UTF-8 byte sequence resulting from encoding ’s
happened also to be a valid Shift-JIS byte sequence. But that won't be the case for all possible UTF-8 byte sequences you might get. Many other characters will be unrecoverably mangled.
You need to find where bytes are being read into the system and decoded as Shift-JIS, and fix that to use UTF-8 instead.
精彩评论