开发者

UTF-8 Corrupted from MySQL to SQLite

I'm porting a PHP Web application I wrote from MySQL 5 to SQLite 3. The text encoding for both is UTF-8 (for all fields, tables, and databases). I'm having trouble transferring a geo database with special characters.

mb_detect_encoding() detects both as returning UTF-8 data.

For example,

Raw output:

MySQL (correct): Dārāb, Iran

SQLite (incorrect): DÄrÄb, Iran

JSON-encoded:

MySQL (correct): D\u0101r\u0101b, Iran

SQLite (incorrect): D\u00c4\u0081r\u00c4\u0081b, Iran

What fixes the problem:

$sqlite_output = utf8_encode($sqlite_output);
$sqlite_开发者_开发百科output = utf8_decode($sqlite_output);

I imagine there's a way of repairing the SQLite database. Thank you in advance.


You're probably going to have to transfer the data again from MySQL to SQLite. I don't think you can predictably revert back to proper encoding, as it seems SQLite interpreted utf8-input as non-utf8 or visa versa when the data first arrived, therefore not storing it in a proper format.

So try to transfer again, while making sure the whole chain of data between MySQL to SQLite is aware of the utf-8 encoding.


Well, thanks for the advice and comments. Unfortunately, no matter which configurations I chose, it wouldn't take. I ended up simply initiating two PDO objects and, using a while loop, inserting one row at a time. (I used mysqldump's --no-data option to get the structure and modified that by hand.)

It took about 10 minutes to insert ~10,000 rows equal to 9.4MB of data on my 256MB CentOS box. (So if you're on a shared environment, be wary of the maximum execution time.) The SQLite database now returns proper Unicode data.

Note to self: It's easier to code a work-around than finding the recommended solution.


The default PHP distribution builds libsqlite in ISO-8859-1 encoding mode. However, this is a misnomer; rather than handling ISO-8859-1, it operates according to your current locale settings for string comparisons and sort ordering. So, rather than ISO-8859-1, you should think of it as being '8-bit' instead.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜