finding duplicate rows on more than one field

2023-04-10 08:13 问答作者：

I am using this query to find duplicates based on two fields:

SELECT 
    last_name, 
    first_name,
    middle_initial,
    COUNT(last_name) AS Duplicates,
    IF(rec_id = '', 1, 0) AS has_REC_ID 
FROM files
GROUP BY last_name, first_name
HAVING COUNT(last_name) > 1 AND COUNT(first_name) > 1;

Okay, what this returns is a set of rows with first, last, and middle names, a column called 'Duplicates' with a lot of 2s, and a column called has_REC_ID with mixed 1s and 0s.

Ultimately, what I'm trying to do is find which rows have matching first and last names--and then for each of those pairs, find the one that has ('') as a value for rec_id, assign the rec_id value from the one that DOES have a rec_id, and then delete the record that had a rec_id in the first place.

So for starters I though I would create a new column and do something like this:

UPDATE files a 
SET a.has_dup    --new column
    = if(a.last_name IN (
                         SELECT b.last_name
                         FROM files b
                         GROUP BY b.last_name 
                         HAVING COUNT(b.last_name) > 1
                        )
      , 1, null);

But MySQL returns: "You can't specify target table 'a' for update in from clause"

I'll bet there's something much less ridiculous than the method I'm trying here. Can someone please help me figure out what that is?

UPDATE: I also tried:

UPDATE files a 
SET a.has_dup = 1
WHERE a.last_name IN (
                         SELECT b.last_name
                         FROM files b
                         GROUP BY b.last_name 
                         HAVING COUNT(b.last_name) > 1
                    开发者_StackOverflow );

...and got the same error message.

You could:

1) Create a holding table

2) Populate the holding table with those rows that have a matching first and last name and have rec_id != ""

3) Delete the rows from the original table (files) that have a matching first and last name and have rec_id != ""

4) Update the rows in the original table that have a matching first and last name and have rec_id = "".

5) Drop the holding table

So something like:

create table temp
(
firstname varchar(100) not null,
lastname varchar(100) not null,
rec_id int not null
);


insert into temp (select firstname,lastname,rec_id from files where firstname =    lastname and rec_id != '');


delete from files where firstname = lastname and rec_id != '';

update files f
set f.rec_id = (select t.rec_id from temp t where f.firstname = t.firstname and f.lastname = t.lastname)
where f.firstname = f.lastname 
and f.rec_id != '';


drop table temp;

From the documentation:

Currently, you cannot update a table and select from the same table in a subquery.

I can't think of a quick workaround to that.

Update

Apparently, there is a "quick" workaround, but whether or not it's performant is another issue. It's all about adding a new layer of indirection by introducing a temporary table:

UPDATE files a 
SET a.has_dup    --new column
    = if(a.last_name IN (
                     SELECT b.last_name
                     FROM
                          (SELECT * FROM files)      -- new table target
                     b
                     GROUP BY b.last_name 
                     HAVING COUNT(b.last_name) > 1
                    ),
      1, null);

I don't have any MySQL to test, but this I think this should be work: (EDITED->FAIL)

UPDATE files
SET has_dup
    = if(last_name IN (
                         SELECT b.last_name
                         FROM files b
                         GROUP BY b.last_name 
                         HAVING COUNT(b.last_name) > 1
                      )
      , 1, null);

EDITED: Another try:

UPDATE files f, (SELECT b.last_name
                   FROM files b
               GROUP BY b.last_name 
                 HAVING COUNT(b.last_name) > 1
                ) as duplicates
   SET f.has_dup = 1
 WHERE f.last_name = duplicates.last_name

继续阅读：duplicates

finding duplicate rows on more than one field

Update

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Update

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？