Please talk me through MySQL collations
I'm setting up a new MySQL database for a website and I want the content-type to be UTF-8.
Why does MySQL have so many different, seemingly language-specific collations for UTF-8? Isn't the point of UTF-8 that it will encompass all those languages without switching encodings? What are those "_bin" and "_cs" and "_ci" notations? Will choos开发者_开发技巧ing "_bin" make some operations case-sensitive?
My site will be mostly in English, but obviously I would like to be able to do things like, paste a Japanese character into my text without incident.
I think the MySQL docs have the best bird's eye view I've read.
Short story . . .
Collations define the order for sorting and comparison. Collations ending in "_cs" are case-sensitive; "_ci" means case-insensitive. For text, char, or varchar, you probably want one of those two.
The order for sorting and comparison with a "_bin" collation is determined by the binary values of the characters. Values from a "_bin" collation are copied byte-for-byte to the target column. (Values from the other collations might be converted to a different character set.) All characters are significant, including trailing spaces. Uppercase and lowercase are meaningless.
精彩评论