UTF-8 vs ASCII Text
Why does sql database use UTF-8 Encoding? do they开发者_如何转开发 both use 8-bit to store a character?
UTF-8 is used to support a large range of characters. In UTF-8, up to 4 bytes can be used to represent a single character.
Joel has written an article on this subject that you may want to refer to
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
For "normal" characters, only 8 bits are used. For characters that do not fit in 8 bits more bits can be used. This makes UTF-8 is a variable length encoding.
Wikipedia has a good article on UTF-8.
ASCII only defines 128 character. So only 7 bits. But is normally stored with 8 bits/character. RS232 (old serial communication) can be used with bytes of 7 bits.
ASCII can only represent a limited number of characters at one time. It isn't very useful to represent any language that isn't based on a Latin character set. However, UTF-8 which is an encoding standard for UCS-4 (Unicode) can represent almost any language. It does this by chaining multiple bytes together to represent one character (or glyph to be more correct).
A more sophisticated encoding increases the index access time drastically. It's something to think about, when encountering performance problems in writing or reading from an database.
精彩评论