开发者

How was the position of the Surrogates Area (UTF-16) chosen?

Was the position of UTF-1开发者_如何转开发6 surrogates area (U+D800..U+DFFF) chosen at random or does it have some logical reason, that it is on this place?


The surrogates area was added in Unicode 2.0, to expand the code beyond 65536 code points while retaining compatibility with the existing 16-bit representation. To encode the 20 bits necessary to represent the 1048576 new code points, they took 1024 characters to represent the first 10 bits and 1024 to represent the second 10 bits (they used 2048 characters instead of 1024 to allow the code to be self-synchronizing). For efficiency in recognizing the characters, it would be best if all 2048 shared a (binary) prefix.

I can only guess that they wanted to shove this unusually-purposed block to higher rather than lower codepoints. The blocks 0xE000–0xE7FF, 0xE800–0xEFFF, and 0xF000–0xF7FF were already reserved for the "private use" area, and 0xF800–0xFFFF was also partially reserved for private use and partially used for other codes. So 0xD800–0xDFFF would have been the highest block available.


Unicode was originally designed as a 16-bit code, and had already assigned a bunch of characters before the need for “supplementary planes” was recognized. The largest available block was U+A000 – U+DFFF, so surrogates would have to go somewhere in there.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜