开发者

Easy way of converting php serialized strings to utf8?

I'm trying to convert a greek database to utf8. At this point, I've figured out how to do it (via MySQL, not through the iconv() function) but I have a problem: The application stores lots of data in the database in php serialized format (via serialize()).

As you may know, this format stores the string lengths in the serialized string. This means that since the lengths change after the conversion (because php5 doesn't support Unicode properly) those strings can't be unserialized anymore.

So far, I'm considering using one of the following approaches to work around this:

  1. Use PHP to convert those strings to utf8, and instead of converting the whole serialized string, unserialize it and convert every item in the array.
  2. Write a script to re-calculate the lengths of the serialized strings.

Option #2 seems easier, but I'm thinking there has to be a quicker way to do this. Maybe even a freely available script for converting them, since I'm definitely not the first one to face this problem. A开发者_如何学Cny ideas?

Thanks in advance.


Do a SHOW CREATE TABLE and check the TABLE's encoding. Then connect to the database with that same encoding (execute a USE 'that encoding';).

Now when you retrieve the serialized string unserialize() it. The return will be whatever your application passed to serialize().

Once you get here you'll need to know what encoding the strings were inserted originally (e.g. ISO-8859-1, CP1252, etc...), so you can convert it to utf-8.

Now that you have your Greek, no pun intended, converted to a utf-8 string you can put it back into the database.

I would highly recommend you reorganize the database to NOT use serialized strings to store data. If you are storing BLOBS in your database consider moving them out of the database and storing them on your file system.

Good luck.


option #1 sounds wayyyy easier, and less error prone to me.

you could probably just unserialize, and then use array_walk_recursive() to do the conversion on each string


Here is specific code to do it. Just insert your settings/code at the TODO keywords:

//TODO: insert your settings here
$database = 'your_db_name';
$table = 'your_table_name';
$column = 'column_that_needs_conversion';
$primarykey = 'name_of_primary_key_in_that_table';

if (mb_internal_encoding() != 'UTF-8') {
    die('This script must be run in an UTF-8 environment!');
}

$utf8_encode_callback = create_function('&$item,$key', 'if (is_string($item)) $item = utf8_encode($item);');

$tablecol = $table .'.'. $column;
$getvaluesSQL = "SELECT ". $tablecol ." AS thevalue, ". $primarykey ." AS primkey FROM ". $database .".". $table ." WHERE ". $tablecol ." IS NOT NULL AND LENGTH(". $tablecol .") > 0";

//TODO: insert code here for executing $getvaluesSQL against your database

if (mysqli_num_rows($db_getvalues) > 0) {
    while ($getvalues = mysqli_fetch_assoc($db_getvalues)) {
        $php = unserialize(utf8_decode($getvalues['thevalue']));

        if (is_array($php)) {
            array_walk_recursive($php, $utf8_encode_callback);
        } elseif (is_string($php)) {
            $php = utf8_encode($php);
        }

        $new_ser = serialize($php);

        # For checking that conversion happened correctly (compare the two files):
        #file_put_contents('c:/dump0.txt', $getvalues['thevalue'] ."\r\n", FILE_APPEND);
        #file_put_contents('c:/dump1.txt', $new_ser ."\r\n", FILE_APPEND);

        $sql = "UPDATE ". $database .".". $table ." SET ". $tablecol ." = '". sql_esc($new_ser) ."' WHERE ". $primarykey ." = ". $getvalues['primkey'];

        //TODO: insert code here for executing $sql against your database

    }
}
echo '<div>Done with '. $tablecol .'</div>';
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜