Is "SET CHARACTER SET utf8" necessary?
I´m rewritting our database class (PDO based), and got s开发者_如何学Gotuck at this. I´ve been taught to both use SET NAMES utf8
and SET CHARACTER SET utf8
when working with UTF-8 in PHP and MySQL.
In PDO I now want to use the PDO::MYSQL_ATTR_INIT_COMMAND
parameter, but it only supports one query.
Is SET CHARACTER SET utf8
necessary?
Using SET CHARACTER SET utf8
after using SET NAMES utf8
will actually reset the character_set_connection
and collation_connection
to
@@character_set_database
and @@collation_database
respectively.
The manual states that
SET NAMES x
is equivalent toSET character_set_client = x; SET character_set_results = x; SET character_set_connection = x;
and
SET CHARACTER SET x
is equivalent toSET character_set_client = x; SET character_set_results = x; SET collation_connection = @@collation_database;
whereas SET collation_connection = x
also internally executes SET character_set_connection = <<character_set_of_collation_x>>
and SET character_set_connection = x
internally also executes SET collation_connection = <<default_collation_of_character_set_x
.
So essentially you're resetting character_set_connection
to @@character_set_database
and collation_connection
to @@collation_database
. The manual explains the usage of these variables:
What character set should the server translate a statement to after receiving it?
For this, the server uses the character_set_connection and collation_connection system variables. It converts statements sent by the client from character_set_client to character_set_connection (except for string literals that have an introducer such as _latin1 or _utf8). collation_connection is important for comparisons of literal strings. For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence.
To sum this up, the encoding/transcoding procedure MySQL uses to process the query and its results is a multi-step-thing:
- MySQL treats the incoming query as being encoded in
character_set_client
. - MySQL transcodes the statement from
character_set_client
intocharacter_set_connection
- when comparing string values to column values MySQL transcodes the string value from
character_set_connection
into the character set of the given database column and uses the column collation to do sorting and comparison. - MySQL builds up the result set encoded in
character_set_results
(this includes result data as well as result metadata such as column names and so on)
So it could be the case that a SET CHARACTER SET utf8
would not be sufficient to provide full UTF-8 support. Think of a default database character set of latin1
and columns defined with utf8
-charset and go through the steps described above. As latin1
cannot cover all the characters that UTF-8 can cover you may lose character information in step 3.
- Step 3: Given that your query is encoded in UTF-8 and contains characters that cannot be represented with
latin1
, these characters will be lost on transcoding fromutf8
tolatin1
(the default database character set) making your query fail.
So I think it's safe to say that SET NAMES ...
is the correct way to handle character set issues. Even though I might add that setting up your MySQL server variables correctly (all the required variables can be set statically in your my.cnf
) frees you from the performance overhead of the extra query required on every connect.
From the mysql manual:
SET CHARACTER SET is similar to SET NAMES but sets
character_set_connection
andcollation_connection
tocharacter_set_database
andcollation_database
. ASET CHARACTER SET x
statement is equivalent to these three statements:SET character_set_client = x; SET character_set_results = x; SET collation_connection = @@collation_database;
Since needing to support international characters sets, I've always just set the the character set of the text type fields on database creation.
I've also always used UTF-8.
Within PHP set the same:
mb_internal_encoding( 'UTF-8' );
The answer many people are looking for is if these queries are needed at all.
As stated in the docs:
If you want the client program to communicate with the server using a character set different from the default, you'll need to indicate which one.
Note that this is an if
clause. It means these queries are needed only if you want to use a charset different from the default your MySQL is using. Doing the same query over and over if not needed like this is a useless waste of resources and should be avoided:
At the time of writing (MySQL 8.0.29) the default MySQL server character set is utf8mb4
and there are no plans to change it for the foreseeable future. You should first check what the current values are, if you get results like these, these queries can be safely removed:
mysql> select @@character_set_client;
+------------------------+
| @@character_set_client |
+------------------------+
| utf8mb4 |
+------------------------+
1 row in set (0.00 sec)
mysql> select @@character_set_connection;
+----------------------------+
| @@character_set_connection |
+----------------------------+
| utf8mb4 |
+----------------------------+
1 row in set (0.00 sec)
mysql> select @@character_set_results;
+-------------------------+
| @@character_set_results |
+-------------------------+
| utf8mb4 |
+-------------------------+
1 row in set (0.00 sec)
mysql> select @@collation_connection;
+------------------------+
| @@collation_connection |
+------------------------+
| utf8mb4_0900_ai_ci |
+------------------------+
1 row in set (0.00 sec)
mysql> select @@collation_database;
+----------------------+
| @@collation_database |
+----------------------+
| utf8mb4_0900_ai_ci |
+----------------------+
1 row in set (0.00 sec)
If you get different results but you control the MySQL configuration, you should change them in the MySQL config files, unless you have different applications requiring different charset. (But nowadays utf8mb4
is the standard and there are very few valid reasons to use a different charset)
精彩评论