Tricky MS Access SQL query to remove surplus duplicate records

2023-01-18 19:21 问答作者：

I have an Access table of the form (I'm simplifying it a bit)

ID            AutoNumber       Primary Key
SchemeName    Text (50)
SchemeNumber  Text (15)

This contains some data eg...

ID            SchemeName           SchemeNumber
--------------------------------------------------------------------
714           Malcolm              ABC123
80            Malcolm              ABC123
96            Malcolms Scheme      ABC123
101           Malcolms Scheme      ABC123
98            Malcolms Scheme      DEF888
654           Another Scheme       BAR876
543           Whatever Scheme      KJL111
etc...

Now. I want to remove duplicate names under the same SchemeNumber. But I want to leave the record which has the longest SchemeName for that scheme number. If there are duplicate records with th开发者_JAVA技巧e same longest length then I just want to leave only one, say, the lowest ID (but any one will do really). From the above example I would want to delete IDs 714, 80 and 101 (to leave only 96).

I thought this would be relatively easy to achieve but it's turning into a bit of a nightmare! Thanks for any suggestions. I know I could loop it programatically but I'd rather have a single DELETE query.

See if this query returns the rows you want to keep:

SELECT r.SchemeNumber, r.SchemeName, Min(r.ID) AS MinOfID
FROM
    (SELECT
        SchemeNumber,
        SchemeName,
        Len(SchemeName) AS name_length,
        ID
    FROM tblSchemes
    ) AS r
    INNER JOIN
    (SELECT
        SchemeNumber,
        Max(Len(SchemeName)) AS name_length
    FROM tblSchemes
    GROUP BY SchemeNumber
    ) AS w
    ON
        (r.SchemeNumber = w.SchemeNumber)
        AND (r.name_length = w.name_length)
GROUP BY r.SchemeNumber, r.SchemeName
ORDER BY r.SchemeName;

If so, save it as qrySchemes2Keep. Then create a DELETE query to discard rows from tblSchemes whose ID value is not found in qrySchemes2Keep.

DELETE 
FROM tblSchemes AS s
WHERE Not Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID);

Just beware, if you later use Access' query designer to make changes to that DELETE query, it may "helpfully" convert the SQL to something like this:

DELETE s.*, Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID)
FROM tblSchemes AS s
WHERE (((Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID))=False));

DELETE FROM Table t1
WHERE EXISTS (SELECT 1 from Table t2
             WHERE t1.SchemeNumber = t2.SchemeNumber
             AND Length(t2.SchemeName) > Length(t1.SchemeName)
)

Depend on your RDBMS you may use function different from Length (Oracle - length, mysql - length, sql server - LEN)

delete ShortScheme
from Scheme ShortScheme
join Scheme LongScheme
  on ShortScheme.SchemeNumber = LongScheme.SchemeNumber
  and (len(ShortScheme.SchemeName) < len(LongScheme.SchemeName) or (len(ShortScheme.SchemeName) = len(LongScheme.SchemeName) and ShortScheme.ID > LongScheme.ID))

(SQL Server flavored)

Now updated to include the specified tie resolution. Although, you may get better performance doing it in two queries: first deleting the schemes with shorter names as in my original query and then going back and deleting the higher ID where there was a tie in name length.

I'd do this in multiple steps. Large delete operations done in a single step make me too nervous -- what if you make a mistake? There's no sql 'undo' statement.

-- Setup the data
DROP Table foo;
DROP Table bar;
DROP Table bat;
DROP Table baz;
CREATE TABLE foo (
  id int(11) NOT NULL,
  SchemeName varchar(50),
  SchemeNumber varchar(15),
  PRIMARY KEY (id)
);

insert into foo values (714, 'Malcolm', 'ABC123' );
insert into foo values (80, 'Malcolm', 'ABC123' );
insert into foo values (96, 'Malcolms Scheme', 'ABC123' );
insert into foo values (101, 'Malcolms Scheme', 'ABC123' );
insert into foo values (98, 'Malcolms Scheme', 'DEF888' );
insert into foo values (654, 'Another Scheme ', 'BAR876' );
insert into foo values (543, 'Whatever Scheme ', 'KJL111' );

-- Find all the records that have dups, find the longest one
create table bar as
    select max(length(SchemeName)) as max_length, SchemeNumber
    from foo
    group by SchemeNumber
    having count(*) > 1;

-- Find the one we want to keep
create table bat as
    select min(a.id) as id, a.SchemeNumber
    from foo a join bar b on a.SchemeNumber = b.SchemeNumber 
       and length(a.SchemeName) = b.max_length
    group by SchemeNumber;

-- Select into this table all the rows to delete
create table baz as 
    select a.id from foo a join bat b where a.SchemeNumber = b.SchemeNumber 
      and a.id != b.id;

This will give you a new table with only records for rows that you want to remove.

Now check these out and make sure that they contain only the rows you want deleted. This way you can make sure that when you do the delete, you know exactly what to expect. It should also be pretty fast.

Then when you're ready, use this command to delete the rows using this command.

delete from foo where id in (select id from baz);

This seems like more work because of the different tables, but it's safer probably just as fast as the other ways. Plus you can stop at any step and make sure the data is what you want before you do any actual deletes.

If your platform supports ranking functions and common table expressions:

with cte as (
  select row_number() 
     over (partition by SchemeNumber order by len(SchemeName) desc) as rn
  from Table)
delete from cte where rn > 1;

try this:

   Select * From Table t
   Where Len(SchemeName) <
      (Select Max(Len(Schemename))
       From Table
       Where SchemeNumber = t.SchemeNumber )
    And Id > 
      (Select Min (Id) 
       From Table
       Where SchemeNumber = t.SchemeNumber
           And SchemeName = t.SchemeName)

or this:,...

   Select * From Table t
   Where Id > 
      (Select Min(Id) From Table
       Where SchemeNumber = t.SchemeNumber
         And Len(SchemeName) <
            (Select Max(Len(Schemename))
             From Table
             Where SchemeNumber = t.SchemeNumber))

if either of these selects the records that should be deleted, just change it to a delete

   Delete 
   From Table t
   Where Len(SchemeName) <
      (Select Max(Len(Schemename))
       From Table
       Where SchemeNumber = t.SchemeNumber )
    And Id > 
      (Select Min (Id) 
       From Table
       Where SchemeNumber = t.SchemeNumber
           And SchemeName = t.SchemeName)

or using the second construction:

 Delete From Table t Where Id > 
  (Select Min(Id) From Table
   Where SchemeNumber = t.SchemeNumber
     And Len(SchemeName) <
        (Select Max(Len(Schemename))
         From Table
         Where SchemeNumber = t.SchemeNumber))

继续阅读：ms-access performance sql

Tricky MS Access SQL query to remove surplus duplicate records

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？