Question on unicode translation

2023-02-10 06:03 问答作者：

How does a variable length UTF-8 encoded bytes are deco开发者_JAVA技巧ded/translated to unicode characters?

Each byte with value above 127 (binary 01111111, 7f hex) is a part of a multibyte character.

So, if the first bit is 0, done - single byte character. If not, this is a continuation byte - the bits in the byte also determine how many bytes are in this character (technically up to 6-byte characters would be possible, but UTF-8 is only defined for 1-4 byte characters).

For a history and a more detailed explanation, see this article by Our Fearless Leader ;) The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!), or this wikipedia article on UTF-8 (has more technical details on valid/invalid byte combinations)

i think it's here

UTF-8 is Unicode, so there is no translation. If you mean "How do I see non-ASCII characters on screen when I'm displaying a Unicode string", you need to ensure you have a Unicode-capable font installed & in use.

My company is using this font.

It is as @Piskvor describes.

The algorithms for encoding/decoding UTF-8 is described in RFC 3629.

The following table of (32 bit) Unicode code point ranges to byte sequences comes from that document:

Char. number range  |        UTF-8 octet sequence
   (hexadecimal)    |              (binary)
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

继续阅读：translation unicode utf-8

Question on unicode translation

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？