开发者

Java UTF-16 Encoding code

The function that encodes a Unicode Code Point (Integer) to a char array (Bytes) in java is basically this:

return new char[] { (char) codePoint };

Which is just a cast from the integer value to a char.

I would like to know how this cast is actually done, the code behind that cast to make the conversion from an integer value to a character enc开发者_JAVA技巧oded in UTF-16. I tried looking for it on the java source codes but with no luck.


I'm not sure which function you're talking about.

Casting valid int code points to char will work for code points in the basic multilingual plane just due to how UTF-16 was defined. To convert anything above U+FFFF you should use Character.toChars(int) to convert to UTF-16 code units. The algorithm is defined in RFC 2781.


The code point is just a number that maps to a character, there's no real conversion going on. Unicode code points are specified in hexadecimal, so whatever you codePoint is in hex will map to that character (or glyph).


Since a char is defined to hold UTF-16 data in Java, this is all there is to it. Only if the input is an int (i.e. it can represent a Unicode codepoint of U+10000 or greater) is some calculation necessary. All char values are already UTF-16.


All chars in Java are represented internally in UTF-16. This is just mapping the integer value to that char.


Also, char arrays are already UTF-16, in the Java platform.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜