Removing duplicates from mysql using php
This conserns two tables: "wp_extra_data" and "wp_posts". wp_extra_data has the fields "id", "po开发者_StackOverflow社区st_id" and "post_title_md5". Now I need to figure out how to retrieve duplicates from "post_title_md5" and then *delete from wp_posts where id = wp_extra_data->post_id except for the oldest post_id (aka the one with the lowest ID*
Example: wp_extra_data may contain:
ID post_id post_title_md5
23 45 322a85c7502dfb2fce5bbd6794c00f81
56 92 322a85c7502dfb2fce5bbd6794c00f81
75 12 322a85c7502dfb2fce5bbd6794c00f81
In which case I'd want the script to "remove from wp_posts where id = 92 and 12". So in short; how do I efficiently retrieve duplicates from wp_extra_data and then remove all of them except the one with the lowest key(ID). May be worth to note that I do not know what "post_title_md5" will be up front.
mysql_query("Delete from wp_posts where ID not in
(select post_id from wp_extra_data where id in(select min(ID)
from wp_extra_data
group by post_title_md5))")
Retrieving the duplicate hashes is easy:
SELECT post_title_md5
FROM wp_extra_data
GROUP BY post_title_md5
HAVING COUNT(post_title_md5) > 1
Then you can retrieve the post_id's using the above as part of a subquery:
SELECT post_id
FROM wp_extra_data
WHERE (post_title_md5 IN (... previous query here ...));
However, this'll retrieve all the IDs with the duplicated hash. So you'd have to do some filter on this PHP-side and delete only the left-over IDs after you've eliminated the "first" one.
精彩评论