How to detect CString text encoding in iphone/iPad?
I have a mixed set of CString in differ开发者_开发百科ent text encoding.
Since I do not know the original encoding of the CString, how to detect CString text encoding in iphone/iPad ?
Thanks.
You cannot solve this problem in the general case without some additional information, because the same string could be valid in multiple encodings. For example, the hex values 48 45 4C 4C D4 equate to "HELLÔ" in iso-8859-1, and "HELLт" in the KOI8-R encoding. Any of the 8-bit encodings are going to be pretty much indistinguishable, unless you start getting into heuristics like doing dictionary checks (hmmm... looks like Bulgarian).
One strategy is to try utf-8 first, and then fall back on a designated 8-bit encoding (e.g., iso-8859-1) if the input fails to decode as utf-8. (With utf-8, there are byte sequences that are invalid, so there's a good chance that a string in some arbitrary 8-bit encoding will throw an error if you try to decode it as utf-8).
The NSString class offers some encoding detection with +stringWithContentsOfFile:usedEncoding:error
, but it seems to be available only when loading from a file or URL. I'm not sure how many encodings it tries or how accurate it is.
精彩评论