开发者

finding duplicate rows on more than one field

I am using this query to find duplicates based on two fields:

SELECT 
    last_name, 
    first_name,
    middle_initial,
    COUNT(last_name) AS Duplicates,
    IF(rec_id = '', 1, 0) AS has_REC_ID 
FROM files
GROUP BY last_name, first_name
HAVING COUNT(last_name) > 1 AND COUNT(first_name) > 1;

Okay, what this returns is a set of rows with first, last, and middle names, a column called 'Duplicates' with a lot of 2s, and a column called has_REC_ID with mixed 1s and 0s.

Ultimately, what I'm trying to do is find which rows have matching first and last names--and then for each of those pairs, find the one that has ('') as a value for rec_id, assign the rec_id value from the one that DOES have a rec_id, and then delete the record that had a rec_id in the first place.

So for starters I though I would create a new column and do something like this:

UPDATE files a 
SET a.has_dup    --new column
    = if(a.last_name IN (
                         SELECT b.last_name
                         FROM files b
                         GROUP BY b.last_name 
                         HAVING COUNT(b.last_name) > 1
                        )
      , 1, null);

But MySQL returns: "You can't specify target table 'a' for update in from clause"

I'll bet there's something much less ridiculous than the method I'm trying here. Can someone please help me figure out what that is?

UPDATE: I also tried:

UPDATE files a 
SET a.has_dup = 1
WHERE a.last_name IN (
                         SELECT b.last_name
                         FROM files b
                         GROUP BY b.last_name 
                         HAVING COUNT(b.last_name) > 1
                    开发者_StackOverflow );

...and got the same error message.


You could:

1) Create a holding table

2) Populate the holding table with those rows that have a matching first and last name and have rec_id != ""

3) Delete the rows from the original table (files) that have a matching first and last name and have rec_id != ""

4) Update the rows in the original table that have a matching first and last name and have rec_id = "".

5) Drop the holding table

So something like:

create table temp
(
firstname varchar(100) not null,
lastname varchar(100) not null,
rec_id int not null
);


insert into temp (select firstname,lastname,rec_id from files where firstname =    lastname and rec_id != '');


delete from files where firstname = lastname and rec_id != '';

update files f
set f.rec_id = (select t.rec_id from temp t where f.firstname = t.firstname and f.lastname = t.lastname)
where f.firstname = f.lastname 
and f.rec_id != '';


drop table temp;


From the documentation:

Currently, you cannot update a table and select from the same table in a subquery.

I can't think of a quick workaround to that.


Update

Apparently, there is a "quick" workaround, but whether or not it's performant is another issue. It's all about adding a new layer of indirection by introducing a temporary table:

UPDATE files a 
SET a.has_dup    --new column
    = if(a.last_name IN (
                     SELECT b.last_name
                     FROM
                          (SELECT * FROM files)      -- new table target
                     b
                     GROUP BY b.last_name 
                     HAVING COUNT(b.last_name) > 1
                    ),
      1, null);


I don't have any MySQL to test, but this I think this should be work: (EDITED->FAIL)

UPDATE files
SET has_dup
    = if(last_name IN (
                         SELECT b.last_name
                         FROM files b
                         GROUP BY b.last_name 
                         HAVING COUNT(b.last_name) > 1
                      )
      , 1, null);

EDITED: Another try:

UPDATE files f, (SELECT b.last_name
                   FROM files b
               GROUP BY b.last_name 
                 HAVING COUNT(b.last_name) > 1
                ) as duplicates
   SET f.has_dup = 1
 WHERE f.last_name = duplicates.last_name
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜