开发者

Java unicode where to find example N-byte unicode characters

I'm looking for sample 1-byte, 2-byte, 3-byte, 4-byte, 5-byte, and 6-byte unicode characters. Any links to some sort of reference of all the different unicode characters out there and how big they 开发者_如何学编程are (byte-wise) would be greatly appreciated. I'm hoping this reference also has code points like \uXXXXX.


There is no such thing as "1-byte, 2-byte, 3-byte, 4-byte, 5-byte, and 6-byte unicode characters".

You probably talk about UTF-8 representations of Unicode characters. Similarly, strings in Java are internally represented in UTF-16, so that Java char type represents a 16-bit code unit of UTF-16, and each Unicode character can be represented by either one or two these code units, and each code unit can be represented as \uxxxx in string literals (note that there are only 4 hex digits in these sequences, since code units are 16-bit long).

So, if you need a reference of Unicode characters with their UTF-8 and UTF-16 representations, you can take a look at the table at fileformat.info.

See also:

  • The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
  • Unicode - How to get the characters right?
  • A to Z Index of Unicode Characters


As axtavt points out, the concept of n-byte Unicode characters is meaningless; assuming you mean UTF-8, then a very simple table, which might help you with testing etc, might be as follows. Note that all example characters work on my browser (Chrome on Ubuntu) but your mileage may vary in terms of displaying, copying/pasting, etc.

UTF-8 bytes  Start    End       Example Character
1            U+0000   U+007F    ! EXCLAMATION MARK U+0021)
2            U+0080   U+07FF    ¶ PILCROW SIGN (U+00B6)
3            U+0800   U+FFFF    ‱ PER TEN THOUSAND SIGN (U+2031)
4            U+10000  U+1FFFFF  
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜