MySQL CHAR() Function and UTF8 Output?

2022-12-21 20:49 问答作者：

+--------------------------+--------------------------------------------------------+
| Variable_name            | Value                                                  |
+--------------------------+--------------------------------------------------------+
| character_set_client     | utf8                                                   |
| character_set_connection | utf8                  开发者_C百科                                 |
| character_set_database   | utf8                                                   |
| character_set_filesystem | binary                                                 |
| character_set_results    | utf8                                                   |
| character_set_server     | utf8                                                   |
| character_set_system     | utf8                                                   |
| character_sets_dir       | /usr/local/mysql-5.1.41-osx10.5-x86_64/share/charsets/ |
+--------------------------+--------------------------------------------------------+
8 rows in set (0.00 sec)

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.1.41    |
+-----------+
1 row in set (0.00 sec)

mysql> select char(0x00FC);
+--------------+
| char(0x00FC) |
+--------------+
| ?            |
+--------------+
1 row in set (0.00 sec)

Expecting actual utf8 character --> " ü " instead of " ? " Tried char(0x00FC using utf8) also, but no go.

Using mysql version 5.1.41

Been allover the Google, cannot find anything on this. The MySQL docs simply say that multibyte output is expected on values greater than 255, after mysql version 5.0.14.

Thanks

You are confusing UTF-8 with Unicode.

0x00FC is the Unicode code point for ü:

mysql> select char(0x00FC using ucs2);
+----------------------+
| char(0x00FC using ucs2) |
+----------------------+
| ü                   | 
+----------------------+

In UTF-8 encoding, 0x00FC is represented by two bytes:

mysql> select char(0xC3BC using utf8);
+-------------------------+
| char(0xC3BC using utf8) |
+-------------------------+
| ü                      | 
+-------------------------+

UTF-8 is merely a way of encoding Unicode characters in binary form. It is meant to be space efficient, which is why ASCII characters only take a single byte, and iso-8859-1 characters such as ü only take two bytes. Some other characters take three or four bytes, but they are much less common.

Adding to Martin's answer:

You can use an "introducer" instead of the CHAR() function. To do this, you specify the encoding, prefixed with an underscore, before the code point:
```
_utf16 0xFC
```
or:
```
_utf16 0x00FC
```
If the goal is to specify the code point instead of the encoded byte sequence, then you need to use an encoding in which the code point value just happens to be the encoded byte sequence. For example, as shown in Martin's answer, 0x00FC is both the code point value for ü and the encoded byte sequence for ucs2 / utf16 (they are effectively the same encoding for BMP characters, but I prefer to use "utf16" as it is consistent with "utf8" and "utf32", consistent in the "utf" theme).

But, utf16 only works for BMP characters (code points U+0000 - U+FFFF) in terms of specifying the code point value. If you want a Supplementary Character, then you will need to use the utf32 encoding. Not only does _utf32 0xFC return ü, but:
```
_utf32 0x1F47E
```
returns: 👾

For more details on these options, plus Unicode escape sequences for other languages and platforms, please see my post:

Unicode Escape Sequences Across Various Languages and Platforms (including Supplementary Characters)

继续阅读：escaping string-literals unicode utf-8

MySQL CHAR() Function and UTF8 Output?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？