UTF8 issues PHP -> MySQL. Getting question marks in database?
OK, I am currently in PHP/MySQL/UTF-8/Unicode hell!
My environment: MySQL: 5.1.53 Server characterset: latin1 Db characterset: latin1 Client characterset: latin1 Conn. characterset: latin1
PHP: 5.3.3
My PHP files are saved as UTF-8 format, not ASCII files.
In my PHP code when I make the database connection I do the following:
ini_set('default_charset', 'utf-8');
$my_db = mysql_connect(DEV_DB, DEV_USER, DEV_PASS);
mysql_select_db(MY_DB);
// I have tried both of the following utf8 connection functions
// mysql_query("SET NAMES 'utf8'", $my_db);
mysql_set_charset('utf8', $my_db);
// Detect if form value is not UTF-8
if (mb_detect_encoding($_POST['lang_desc']) == 'UTF-8') {
$lang_description = $_POST['lang_desc'];
} else {
$lang_description = utf8_encode($_POST['lang_desc']);
}
$language_sql = sprintf(
'INSERT INTO app_languages (language_id, ap开发者_JAVA百科p_id, description) VALUES (%d, %d, "%s")',
intval($lang_data['lang_id']),
intval($new_app_id),
mysql_real_escape_string($lang_description, $my_db)
);
The format/create of my MySQL database is:
CREATE TABLE IF NOT EXISTS
app_languages
(language_id
int(10) unsigned NOT NULL,app_id
int(10) unsigned NOT NULL,description
tinytext collate utf8_unicode_ci, PRIMARY KEY (language_id
,app_id
) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
The SQL statements that are generated from my PHP code look like this:
INSERT INTO app_languages (language_id, app_id, description) VALUES (91, 2055, "阿拉伯体育新闻和信息")
INSERT INTO app_languages (language_id, app_id, description) VALUES (26, 2055, "阿拉伯體育新聞和信息")
INSERT INTO app_languages (language_id, app_id, description) VALUES (56, 2055, "בערבית ספורט חדשות ומידע")
INSERT INTO app_languages (language_id, app_id, description) VALUES (69, 2055, "アラビア語のスポーツニュースと情報")
Yet, the output appears in my database as this:
| 69 | 2055 | ????????????????? |
| 56 | 2055 | ?????? ????? ????? ????? |
| 28 | 2055 | Arapski sportske vijesti i informacije |
| 42 | 2055 | Arabe des nouvelles sportives et d\'information |
| 91 | 2055 | ?????????? |
What am I doing wrong??
P.S. We can use Putty to SSH directly to the database server and via the command line Paste one of the unicode/multi-lingual insert statements. And they work successfully!?
Thanks for any light you can shed on this, it's driving me mad.
Cheers, Jason
try to execute the following query after you selected the db:
SET NAMES 'utf8'
this query should solve the problem with different charsets in your files and the db.
felix
The answer is right in your question. You're using latin1 throughout your database, and it can't handle unicode. You need to change those to UTF-8 as well.
//first make sure your file produce utf-8 chars
header('Content-Type: text/html; charset=utf-8');
mb_detect_encoding
is quite useless unless you already know what you are dealing with. You probably should not rely on it unless you specify the second and third argument. Currently it probably does not return what you think it does.
I see that the words you saw it as ???????
are Arabic words.. which must have a collation
cp1256_general_ci
not
UTF-8_general_ci
change that, it may solve the problem.
精彩评论