How does a program read unicode? [closed]

2023-04-12 12:21 问答作者：

Closed. This question needs details or clarity. It is not currently accepting answers.

Want to improve this question? Add details and clarify the problem by editing this post.

Closed 9 years ago.

Improve this question

Unicode code units can be of variable size since characters can be represented by 2 bytes or more bytes (sequence of 2 bytes). So if stored in binary format, how can a program know how to read them back?

Lets say 'a' is represented by 0F0F 13F3 and 'b' is represented by 02AD BC39 09F3 459F

If I write them in file foo.txt:

0F0F 13F3 02AD BC39 09F3 459F

Then how would I know where to stop for 'a' and 'b'?

Guys here I am talking about reading , writing pure unicode i开发者_JS百科.e without converting it into any other format based upon popular charset such as utf-8 .

First, not all Unicode representations are variable length. UTF-32 and USC-2 are fixed length. UTF-8 and UTF-16 are each in their own way variable length.

Second, if you read the specification, you will learn that the sequences are self-describing. The byte values (in UTF-8) that can be first bytes can't be second or third, etc. Ditto for the surrogate pairs that represent non-BMP characters in UTF-16.

A commonly used encoding is UTF-8. The way it's structured is that some predefined bits of the character's bytes tell you whether there are more bytes to come.

See http://en.wikipedia.org/wiki/UTF-8#Design for a nice diagram.

继续阅读：unicode

How does a program read unicode? [closed]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？