开发者

Reading and outputting UTF-8 strings in c/cocoa

In an objective-c/cocoa app, I am using c functions to open a text file, read it line-by-line and use some lines in a third-party function. In psuedo-code:

char *line = fgets(aFile);
library_function(line);  // This function calls for a utf-8 encoded char * string

This works fine until the input file contains special characters (such as accents or the UTF-8 BOM) whereupon the library function outputs mangled characters.


However, if I do this:

char *line = fgets(aFile);
NSString *stringObj = [NSString stringWithUTF8String:line];开发者_开发百科
library_function([stringObj UTF8String]);

Then it all works fine and the string is outputted correctly.


What is that [NSString... line doing that I'm not? Am I doing something wrong with how the line is fetched initially? Or is it something else entirely?


UTF-8 is a multi-byte character set (see wikipedia), which means some characters require multiple bytes (the accented ones you've run into). C's char type is a single byte, so C's definition of "character" doesn't match Unicode's.

If you want to read Unicode with the standard C RTL, you'll also need to use a Unicode conversion library, such as libiconv.

(Using wchar_t may also work; I've never researched it.)

Or you can use NSString, which already supports Unicode.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜