Removing duplicate entries from MySQL database
I have a table with 8 columns in, but over time I have picked up numerous duplicates. I have looked at the other question with a similar topic, but it does not solve the issue I am currently having.
+---------------------------------------------------------------------------------------+
| id | market | agent | report_name | producer_code | report_date | entered_date | sync |
+---------------------------------------------------------------------------------------+
What defines a unique entry is based on the market, agent, report_name, producer_code, and report_date fields. What I am looking for is a way to list all the duplicate entries and delete them. Or to just delete the duplicate entries.
I have thought about doing it with a script, but the table contains 2.5mil entries, and the time it would take would be unfeasible.
Could anybody suggest any alternatives? I have seen people get a list of duplicates using the following query, but not sure on how to ada开发者_JS百科pt it to my situation:
SELECT id, count(*) AS n
FROM table_name
GROUP BY id
HAVING n > 1
Here are two strategies you might think about. You will have to adjust the columns used to select duplicates based upon what you actually consider a duplicate. I just included all of your listed columns other than the id column.
The first simply creates a new table without duplicates. Sometimes this is actually faster and easier than trying to delete all the offending rows. Just create a new table, insert the unique rows (I used min(id) for the id of the resulting row), rename the two tables, and (once you are satisfied that everything worked correctly) drop the original table. Of course, if you have any foreign key constraints you'll have to deal with those as well.
create table table_copy like table_name;
insert into table_copy
(id, market, agent, report_name, producer_code, report_date, entered_date, sync)
select min(id), market, agent, report_name, producer_code, report_date,
entered_date, sync
from table_name
group by market, agent, report_name, producer_code, report_date,
entered_date, sync;
RENAME TABLE table_name TO table_old, table_copy TO table_name;
drop table table_old;
The second strategy, which just deletes the duplicates, uses a temporary table to hold the information about what rows have duplicates since MySQL won't allow you to select from the same table you are deleting from in a subquery. Simply create a temporary table with the columns that identify the duplicates plus an id column that will actually hold the id to keep and then you can do a multi-table delete where you join the two tables to select just the duplicates.
create temporary table dups
select min(id), market, agent, report_name, producer_code, report_date,
entered_date, sync
from table_name
group by market, agent, report_name, producer_code, report_date,
entered_date, sync
having count(*) > 1;
delete t
from table_name t, dups d
where t.id != d.id
and t.market = d.market
and t.agent = d.agent
and t.report_name = d.report_name
and t.producer_code = d.producer_code
and t.report_date = d.report_date
and t.entered_date = d.entered_date
and t.sync = d.sync;
You can find the dupes, based on your "key" fields, by doing:
select id, count(*) as row_count
from table
group by market, agent, report_name, producer_code, report_date
having (row_count > 1)
which you could then use in a delete script. Of course, you'd have to be very careful doing this, as it'll return ALL the duplicate rows, and you'd want to save at least ONE of those rows from each grouping.
Another easy way would be to
- create a new table
- put a UNIQUE index on the fields you need to be unique (a primary key is a special kind of unique index)
- use INSERT IGNORE INTO newtable SELECT * FROM oldtable (ORDER BY if you want the last/first records to remain - should there be a difference in the other columns)
- DROP the old table and RENAME the new table to the old table
You may also use Primary key on the columns the unique entries are based on, this will prevent adding new records with duplicate details.
精彩评论