Java UTF-16 Encoding code
The function that encodes a Unicode Code Point (Integer) to a char array (Bytes) in java is basically this:
return new char[] { (char) codePoint };
Which is just a cast from the integer value to a char.
I would like to know how this cast is actually done, the code behind that cast to make the conversion from an integer value to a character enc开发者_JAVA技巧oded in UTF-16. I tried looking for it on the java source codes but with no luck.
I'm not sure which function you're talking about.
Casting valid int
code points to char
will work for code points in the basic multilingual plane just due to how UTF-16 was defined. To convert anything above U+FFFF you should use Character.toChars(int) to convert to UTF-16 code units. The algorithm is defined in RFC 2781.
The code point is just a number that maps to a character, there's no real conversion going on. Unicode code points are specified in hexadecimal, so whatever you codePoint is in hex will map to that character (or glyph).
Since a char
is defined to hold UTF-16 data in Java, this is all there is to it. Only if the input is an int
(i.e. it can represent a Unicode codepoint of U+10000 or greater) is some calculation necessary. All char
values are already UTF-16.
All char
s in Java are represented internally in UTF-16. This is just mapping the integer value to that char
.
Also, char arrays are already UTF-16, in the Java platform.
精彩评论