开发者

Why Same Unicode Characters Have Different Integer Values and Vice-Versa?

I am trying to read a File in xcode which contains string of Unicode characters.My code will traverse through string, pick up characters one by one and prints its corresponding int value. Below is the code that reads small chunk of it.

NSString *theText = @"˘¸";  
for(int i=0; i<[theText length]; i++) {  
        int k= 249+(i*开发者_JAVA百科3);  
        NSLog(@"%c and %C >> UNICODE DEC-VAL >> %d",[theText characterAtIndex:i],[theText characterAtIndex:i],[theText characterAtIndex:i]);  
        NSLog(@"%c and %C >> UNICODE DEC-VAL >> %d",k,k,k);  
}

and its result is :

 ÿ and ˘ >> UNICODE DEC-VAL >> 728  
 ˘ and ù >> UNICODE DEC-VAL >> 249  
 ∏ and ¸ >> UNICODE DEC-VAL >> 184  
 ¸ and ü >> UNICODE DEC-VAL >> 252

As it is obvious there is ambiguity, for same integer value characters differs depending on format specifier and also for same unicode charcter their integer value differs. I am just curious to know why it is so?

Thanks.


The %c format specifier requires a byte (char) value. You're passing larger integers, so it results in undefined behavior. Aside from that, it looks like you're using some legacy 8bit codepage locale, rather than UTF-8, so 8-bit bytes have individual identities as characters which don't match Unicode. In short, you have a lot of things broken.


%c is character, %C is unicode character. I'm guessing the previous uses some 8-bit encoding, I'm guessing that >255 values are being modulo'd by 256, and you always get an 8-bit character. The unicode character always prints the character value that you wanted.

Also note that there are different unicode characters with the same appearance.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜