Java reading in character streams with supplementary unicode characters

2023-04-12 16:46 问答作者：

I'm having trouble reading in supplementary unicode characters using Java. I have a file that potentially contains characters in the supplementary set (anything greater than \uFFFF). When I setup my InputStreamReader to read the file using UTF-8 I would expect the read() method to return a single character for each supplementary character, instead it seems to split on the 16 bit threshold.

I saw some other questions about basic unicode character streams, but nothing seems to deal with the greater than 16 bit case.

Here's some simplified sample code:

InputStreamReader input = new InputStreamReader(file, "UTF8");
int n开发者_开发百科extChar = input.read();
while(nextChar != -1) {
    ...
    nextChar = input.read();
}

Does anyone know what I need to do to correctly read in a UTF-8 encoded file that contains supplementary characters?

Java works with UTF-16. So, if your input stream has astral characters, they will appear as a surrogate pair, i.e., as two chars. The first character is the high surrogate, and the second character is the low surrogate.

Though read() is defined to return int, and could theoretically return a supplementary character's code point "all at once", I believe the return type is only int to allow a value of -1 to be returned.

The value you're getting from read() is basically a char by another name, and Java a char is limited to 16 bits.

Java can only represent supplementary characters as a UTF-16 surrogate pair, there is no such thing as a "single character" (at least in the char sense) once you get above 0xFFFF as far as Java is concerned.

继续阅读：astral-plane supplementary unicode

Java reading in character streams with supplementary unicode characters

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？