开发者

More elegant way to decode \u0000 Unicode in an input stream

I'm parsing an input开发者_运维问答 stream coming from Facebook. I'm using something like

BufferedReader in =
    new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));

And then in.readLine to actually read from the stream.

The stream seems to have Unicode characters already encoded in ASCII, so I see things like \u00e4 (with \u actually being two discrete ASCII characters). Right now, I'm fishing for "\u" and decoding the subsequent two hex bytes, turn them into a char and replace the string with them, which is obviously the worst way to do it.

I'm sure there's a cool way to use a native function to decode the special characters as the stream is being read (I was hoping it could be done on the InputStreamReader layer). But how?


The data format is JSON, which I didn't mention (and which Thanatos already assumed). Using Android's JSON parser will automatically decode the characters properly. Parsing JSON yourself is obviously a dumb idea on several levels.


If you see '\u00e4' with the '\' and the 'u' being separate, then the '0', '0', 'e' and '4' probably make up the 4 hex digits of a 2 byte (16 bit) Unicode character. The notation is based on C99; the alternative is '\U00XXYYZZ' where there are 8 hex digits representing a 32-bit UTF-32 character (but, because Unicode is a 21-bit code set, the first 2 of the 8 digits are always 0, and the next is often (usually) 0 too).

However, that doesn't answer your question about what's the right Android way to read the data, and you are right that there probably is one.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜