Tricky MS Access SQL query to remove surplus duplicate records
I have an Access table of the form (I'm simplifying it a bit)
ID AutoNumber Primary Key
SchemeName Text (50)
SchemeNumber Text (15)
This contains some data eg...
ID SchemeName SchemeNumber
--------------------------------------------------------------------
714 Malcolm ABC123
80 Malcolm ABC123
96 Malcolms Scheme ABC123
101 Malcolms Scheme ABC123
98 Malcolms Scheme DEF888
654 Another Scheme BAR876
543 Whatever Scheme KJL111
etc...
Now. I want to remove duplicate names under the same SchemeNumber. But I want to leave the record which has the longest SchemeName for that scheme number. If there are duplicate records with th开发者_JAVA技巧e same longest length then I just want to leave only one, say, the lowest ID (but any one will do really). From the above example I would want to delete IDs 714, 80 and 101 (to leave only 96).
I thought this would be relatively easy to achieve but it's turning into a bit of a nightmare! Thanks for any suggestions. I know I could loop it programatically but I'd rather have a single DELETE query.
See if this query returns the rows you want to keep:
SELECT r.SchemeNumber, r.SchemeName, Min(r.ID) AS MinOfID
FROM
(SELECT
SchemeNumber,
SchemeName,
Len(SchemeName) AS name_length,
ID
FROM tblSchemes
) AS r
INNER JOIN
(SELECT
SchemeNumber,
Max(Len(SchemeName)) AS name_length
FROM tblSchemes
GROUP BY SchemeNumber
) AS w
ON
(r.SchemeNumber = w.SchemeNumber)
AND (r.name_length = w.name_length)
GROUP BY r.SchemeNumber, r.SchemeName
ORDER BY r.SchemeName;
If so, save it as qrySchemes2Keep. Then create a DELETE query to discard rows from tblSchemes whose ID value is not found in qrySchemes2Keep.
DELETE
FROM tblSchemes AS s
WHERE Not Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID);
Just beware, if you later use Access' query designer to make changes to that DELETE query, it may "helpfully" convert the SQL to something like this:
DELETE s.*, Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID)
FROM tblSchemes AS s
WHERE (((Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID))=False));
DELETE FROM Table t1
WHERE EXISTS (SELECT 1 from Table t2
WHERE t1.SchemeNumber = t2.SchemeNumber
AND Length(t2.SchemeName) > Length(t1.SchemeName)
)
Depend on your RDBMS you may use function different from Length (Oracle - length, mysql - length, sql server - LEN)
delete ShortScheme
from Scheme ShortScheme
join Scheme LongScheme
on ShortScheme.SchemeNumber = LongScheme.SchemeNumber
and (len(ShortScheme.SchemeName) < len(LongScheme.SchemeName) or (len(ShortScheme.SchemeName) = len(LongScheme.SchemeName) and ShortScheme.ID > LongScheme.ID))
(SQL Server flavored)
Now updated to include the specified tie resolution. Although, you may get better performance doing it in two queries: first deleting the schemes with shorter names as in my original query and then going back and deleting the higher ID where there was a tie in name length.
I'd do this in multiple steps. Large delete operations done in a single step make me too nervous -- what if you make a mistake? There's no sql 'undo' statement.
-- Setup the data
DROP Table foo;
DROP Table bar;
DROP Table bat;
DROP Table baz;
CREATE TABLE foo (
id int(11) NOT NULL,
SchemeName varchar(50),
SchemeNumber varchar(15),
PRIMARY KEY (id)
);
insert into foo values (714, 'Malcolm', 'ABC123' );
insert into foo values (80, 'Malcolm', 'ABC123' );
insert into foo values (96, 'Malcolms Scheme', 'ABC123' );
insert into foo values (101, 'Malcolms Scheme', 'ABC123' );
insert into foo values (98, 'Malcolms Scheme', 'DEF888' );
insert into foo values (654, 'Another Scheme ', 'BAR876' );
insert into foo values (543, 'Whatever Scheme ', 'KJL111' );
-- Find all the records that have dups, find the longest one
create table bar as
select max(length(SchemeName)) as max_length, SchemeNumber
from foo
group by SchemeNumber
having count(*) > 1;
-- Find the one we want to keep
create table bat as
select min(a.id) as id, a.SchemeNumber
from foo a join bar b on a.SchemeNumber = b.SchemeNumber
and length(a.SchemeName) = b.max_length
group by SchemeNumber;
-- Select into this table all the rows to delete
create table baz as
select a.id from foo a join bat b where a.SchemeNumber = b.SchemeNumber
and a.id != b.id;
This will give you a new table with only records for rows that you want to remove.
Now check these out and make sure that they contain only the rows you want deleted. This way you can make sure that when you do the delete, you know exactly what to expect. It should also be pretty fast.
Then when you're ready, use this command to delete the rows using this command.
delete from foo where id in (select id from baz);
This seems like more work because of the different tables, but it's safer probably just as fast as the other ways. Plus you can stop at any step and make sure the data is what you want before you do any actual deletes.
If your platform supports ranking functions and common table expressions:
with cte as (
select row_number()
over (partition by SchemeNumber order by len(SchemeName) desc) as rn
from Table)
delete from cte where rn > 1;
try this:
Select * From Table t
Where Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber )
And Id >
(Select Min (Id)
From Table
Where SchemeNumber = t.SchemeNumber
And SchemeName = t.SchemeName)
or this:,...
Select * From Table t
Where Id >
(Select Min(Id) From Table
Where SchemeNumber = t.SchemeNumber
And Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber))
if either of these selects the records that should be deleted, just change it to a delete
Delete
From Table t
Where Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber )
And Id >
(Select Min (Id)
From Table
Where SchemeNumber = t.SchemeNumber
And SchemeName = t.SchemeName)
or using the second construction:
Delete From Table t Where Id >
(Select Min(Id) From Table
Where SchemeNumber = t.SchemeNumber
And Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber))
精彩评论