Reading and outputting UTF-8 strings in c/cocoa
In an objective-c/cocoa app, I am using c functions to open a text file, read it line-by-line and use some lines in a third-party function. In psuedo-code:
char *line = fgets(aFile);
library_function(line); // This function calls for a utf-8 encoded char * string
This works fine until the input file contains special characters (such as accents or the UTF-8 BOM) whereupon the library function outputs mangled characters.
However, if I do this:
char *line = fgets(aFile);
NSString *stringObj = [NSString stringWithUTF8String:line];开发者_开发百科
library_function([stringObj UTF8String]);
Then it all works fine and the string is outputted correctly.
What is that [NSString...
line doing that I'm not?
Am I doing something wrong with how the line is fetched initially? Or is it something else entirely?
UTF-8 is a multi-byte character set (see wikipedia), which means some characters require multiple bytes (the accented ones you've run into). C's char
type is a single byte, so C's definition of "character" doesn't match Unicode's.
If you want to read Unicode with the standard C RTL, you'll also need to use a Unicode conversion library, such as libiconv.
(Using wchar_t may also work; I've never researched it.)
Or you can use NSString, which already supports Unicode.
精彩评论