开发者

objective c - does not read utf-8 encoded file

I'm trying to display some japanese text on the ios simulator and an ipod touch. The text is read from an XML file. The header is:

<?xml version="1.0" encoding="utf-8"?>

When the text is in english, it displays fine. However, when the text is Japanese, it comes out as an unintelligible mishmash of single-byte characters.

I have tried saving the file specifically as unicode using TextEdit. I'm using NSXMLParser to parse the data. Any ideas would be much appreciated.

Here is the parsing code

   // Override point for customization after application launch.

    NSString *xmlFilePath = [[[NSBundle mainBundle] resourcePath] stringByAppendingPathComponent:@"questionsutf8.xml"];
    NSString *xmlFileContents = [NSString stringWithContentsOfFile:xmlFilePath];

    NSData *data = [NSData dataWithBytes:[xmlFileContents UTF8S开发者_运维技巧tring] length:[xmlFileContents lengthOfBytesUsingEncoding: NSUTF8StringEncoding]];                   

    XMLReader *xmlReader = [[XMLReader alloc] init];

    [xmlReader parseXMLData: data];


stringWithContentsOfFile: is a deprecated method. It does not do encoding detection unless the file contains the appropriate byte order mark, otherwise it interprets the file as the default C string encoding (the encoding returned by the +defaultCStringEncoding method). Instead, you should use the non-deprecated [and encoding-detecting] method stringWithContentsOfFile:usedEncoding:error:.

You can use it like this:

NSStringEncoding enc;
NSError *error;
NSString *xmlFileContents = [NSString stringWithContentsOfFile:xmlFilePath
                                                  usedEncoding:&enc
                                                         error:&error];

if (xmlFileContents == nil)
{
    NSLog (@"%@", error);
    return;
}


First, you should verify with TextWrangler (free from the Mac app store or barebones.com) that your XML file truly is UTF-8 encoded.

Second, try creating xmlFileContents with +stringWithContentsOfFile:encoding:error:, explicitly specifying UTF-8 encoding. Or, even better, bypass the intermediate string entirely, and create data with +dataWithContentsOfFile:.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜