How to convert Unicode strings (\u00e2, etc) into NSString for display?
I am trying to support arbitrary unicode from a variety of internat开发者_运维知识库ional users. They have already put a bunch of data into sqlite databases on their iPhones, and now I want to capture the data into a database, then send it back to their device. Right now I am using a php page that is sending data back to from an internet mysql database. The data is saved in the mysql database properly, but when it's sent back it comes out as unicode text, such as
Frank\u00e2\u0080\u0099s iPad
instead of just
Frank's iPad
where the apostrophe should really be a curly apostrophe.
The answer posted to another question indicates that there is no built-in Cocoa methods to convert the "\u00e2\u0080\u0099" portion of the unicode string from the webserver to an NSString object. Is this correct?
That seems really surprising (and scarily disappointing), since Cocoa definitely allows input from many different Unicode characters, and I need to support any arbitrary language that I have never heard of, and all of the possible characters. I save them to and from the local sqlite database just fine now, but once I send it to a web server, then perhaps pull down different data, I want to ensure the data pulled from the web server is correctly formatted.
[...] there is no built-in Cocoa methods to convert [...]. Is this correct?
It's not correct.
You might be interested in CFStringTransform
and it's capabilities. It is a full blown ICU transformation engine, which can (also) perform your requested transformation.
See Using Objective C/Cocoa to unescape unicode characters, ie \u1234
All NSStrings are Unicode.
The problem with the “Frank\u00e2\u0080\u0099s iPad
” data isn't that it's Unicode; it's that it's escaped to ASCII. “Frank’s iPad
” is valid Unicode in any UTF, and is what you need.
So, you need to see whether the database is returning the data escaped or the PHP layer is escaping it at some point. If either of those is the case, fix it if you can; the PHP resource should return UTF-8/16/32. Only if that approach fails should you seek to unescape the string on the Cocoa side.
You're correct that there is no built-in way to unescape the string in Cocoa. If you get to that point, see if you can find some open-source code to do it; if not, you'll need to do it yourself, probably using NSScanner.
Check that your web service response has Content type and charset. Also that xml has encoding specified. In PHP you need to add the following before printing XML:
header('Content-type: text/xml; charset=UTF-8'); print '<?xml version="1.0" encoding="UTF-8"?>';
I guess there is just no encoding specified.
精彩评论