开发者

MySQL: Change Collation from utf8_bin to utf8_unicode_ce

Given a full table, how do I cha开发者_开发知识库nge the collation from utf8_bin to utf8_unicode_ce? The normal "alter" query does not work, because of "duplicate entry errors". For example there are two entries

David Hussa

and

David Hußa

I know they are the same. Is there an elegant way to tell MySQL to "merge" the entrys? I should mention, that the id of the entries are used in other tables as reference so this has to be respected too by MySQL. Or do I have to do this the long and annoying way: Means merging every duplicate manually and then change the collation?

The table looks like this:

delimiter $$

CREATE TABLE `authors` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(100) COLLATE utf8_bin NOT NULL,
  `count` int(11) NOT NULL DEFAULT '1',
  PRIMARY KEY (`id`),
  UNIQUE KEY `name_UNIQUE` (`name`),
  FULLTEXT KEY `name_FULLTEXT` (`name`)
) ENGINE=MyISAM AUTO_INCREMENT=930710 DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='Stores all authors from dblp.xml.'$$


You can delete the duplicate entries:

DELETE  a2
FROM    authors a1
JOIN    authors a2
ON      a2.name COLLATE UTF8_GENERAL_CI = a1.name COLLATE UTF8_GENERAL_CI
        AND a2.id < a1.id

Note that this may take long time if your table is large.

It would be better to do this:

  • Drop the UNIQUE constraint

  • Change the collation

  • Create a plain, non-unique index on name

  • Run the query (without COLLATE clause):

    DELETE  a2
    FROM    authors a1
    JOIN    authors a2
    ON      a2.name = a1.name
            AND a2.id < a1.id
    
  • Drop the index

  • Recreate the UNIQUE constraint.

To update the referencing tables, run this queries before deleting the entries:

UPDATE  child c
JOIN    (
        (
        SELECT  name COLLATE utf8_unicode_ci AS name_ci, MAX(id) AS mid
        FROM    authors
        GROUP BY
                name_ci
        ) pa
        JOIN    authors a
        ON      a.name COLLATE utf8_unicode_ci = name_ci
        )
ON      c.author = a.id
SET     author = mid;

on all referencing tables.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜