开发者

UTF-8 vs ASCII Text

Why does sql database use UTF-8 Encoding? do they开发者_如何转开发 both use 8-bit to store a character?


UTF-8 is used to support a large range of characters. In UTF-8, up to 4 bytes can be used to represent a single character.

Joel has written an article on this subject that you may want to refer to

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)


For "normal" characters, only 8 bits are used. For characters that do not fit in 8 bits more bits can be used. This makes UTF-8 is a variable length encoding.

Wikipedia has a good article on UTF-8.

ASCII only defines 128 character. So only 7 bits. But is normally stored with 8 bits/character. RS232 (old serial communication) can be used with bytes of 7 bits.


ASCII can only represent a limited number of characters at one time. It isn't very useful to represent any language that isn't based on a Latin character set. However, UTF-8 which is an encoding standard for UCS-4 (Unicode) can represent almost any language. It does this by chaining multiple bytes together to represent one character (or glyph to be more correct).


A more sophisticated encoding increases the index access time drastically. It's something to think about, when encountering performance problems in writing or reading from an database.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜