开发者

UTF8 issues PHP -> MySQL. Getting question marks in database?

OK, I am currently in PHP/MySQL/UTF-8/Unicode hell!

My environment: MySQL: 5.1.53 Server characterset: latin1 Db characterset: latin1 Client characterset: latin1 Conn. characterset: latin1

PHP: 5.3.3

My PHP files are saved as UTF-8 format, not ASCII files.

In my PHP code when I make the database connection I do the following:

ini_set('default_charset', 'utf-8');
$my_db = mysql_connect(DEV_DB, DEV_USER, DEV_PASS);
mysql_select_db(MY_DB);
// I have tried both of the following utf8 connection functions
// mysql_query("SET NAMES 'utf8'", $my_db);
mysql_set_charset('utf8', $my_db);
// Detect if form value is not UTF-8
if (mb_detect_encoding($_POST['lang_desc']) == 'UTF-8') {
$lang_description = $_POST['lang_desc'];
} else {
$lang_description = utf8_encode($_POST['lang_desc']);
}
$language_sql = sprintf(
'INSERT INTO app_languages (language_id, ap开发者_JAVA百科p_id, description) VALUES (%d, %d, "%s")',
                            intval($lang_data['lang_id']),
                            intval($new_app_id),
                            mysql_real_escape_string($lang_description, $my_db)
);

The format/create of my MySQL database is:

CREATE TABLE IF NOT EXISTS app_languages ( language_id int(10) unsigned NOT NULL, app_id int(10) unsigned NOT NULL, description tinytext collate utf8_unicode_ci, PRIMARY KEY (language_id,app_id) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

The SQL statements that are generated from my PHP code look like this:

INSERT INTO app_languages (language_id, app_id, description) VALUES (91, 2055, "阿拉伯体育新闻和信息")
INSERT INTO app_languages (language_id, app_id, description) VALUES (26, 2055, "阿拉伯體育新聞和信息")
INSERT INTO app_languages (language_id, app_id, description) VALUES (56, 2055, "בערבית ספורט חדשות ומידע")
INSERT INTO app_languages (language_id, app_id, description) VALUES (69, 2055, "アラビア語のスポーツニュースと情報")

Yet, the output appears in my database as this:

|          69 |   2055 | ?????????????????                               |
|          56 |   2055 | ?????? ????? ????? ?????                        |
|          28 |   2055 | Arapski sportske vijesti i informacije          |
|          42 |   2055 | Arabe des nouvelles sportives et d\'information |
|          91 |   2055 | ??????????                                      |

What am I doing wrong??

P.S. We can use Putty to SSH directly to the database server and via the command line Paste one of the unicode/multi-lingual insert statements. And they work successfully!?

Thanks for any light you can shed on this, it's driving me mad.

Cheers, Jason


try to execute the following query after you selected the db:

SET NAMES 'utf8'

this query should solve the problem with different charsets in your files and the db.

felix


The answer is right in your question. You're using latin1 throughout your database, and it can't handle unicode. You need to change those to UTF-8 as well.


 //first make sure your file produce utf-8 chars
 header('Content-Type: text/html; charset=utf-8');


mb_detect_encoding is quite useless unless you already know what you are dealing with. You probably should not rely on it unless you specify the second and third argument. Currently it probably does not return what you think it does.


I see that the words you saw it as ??????? are Arabic words.. which must have a collation

cp1256_general_ci

not

UTF-8_general_ci

change that, it may solve the problem.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜