How to delete completely duplicate rows
Say i have duplicate rows in my table and well my database design is of 3rd class :-
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (1,'Cinthol','cosmetic soap开发者_如何学C','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (1,'Cinthol','cosmetic soap','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (1,'Cinthol','cosmetic soap','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (1,'Lux','cosmetic soap','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (1,'Crowning Glory','cosmetic soap','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (2,'Cinthol','nice soap','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (3,'Lux','nice soap','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (3,'Lux','nice soap','soap');
I want only 1 instance of each row should be present in my table. Thus 2nd, 3rd and last row
whcih are completely identical should be deleted. What query can i write for this? Can it be done without creating temp tables? Just in one single query?
Thanks in advance :)
Try this - it will delete all duplicates from your table:
;WITH duplicates AS
(
SELECT
ProductID, ProductName, Description, Category,
ROW_NUMBER() OVER (PARTITION BY ProductID, ProductName
ORDER BY ProductID) 'RowNum'
FROM dbo.tblProduct
)
DELETE FROM duplicates
WHERE RowNum > 1
GO
SELECT * FROM dbo.tblProduct
GO
Your duplicates should be gone now: output is:
ProductID ProductName DESCRIPTION Category
1 Cinthol cosmetic soap soap
1 Lux cosmetic soap soap
1 Crowning Glory cosmetic soap soap
2 Cinthol nice soap soap
3 Lux nice soap soap
DELETE tblProduct
FROM tblProduct
LEFT OUTER JOIN (
SELECT MIN(ProductId) as ProductId, ProductName, Description, Category
FROM tblProduct
GROUP BY ProductName, Description, Category
) as KeepRows ON
tblProduct.ProductId= KeepRows.ProductId
WHERE
KeepRows.ProductId IS NULL
Stolen from How can I remove duplicate rows?
UPDATE:
This will only work if ProductId is a Primary Key (which it is not). You are better off using @marc_s' method, but I'll leave this up in case someone using a PK comes across this post.
I had to do this a few weeks back... what version of SQL Server are you using? In SQL Server 2005 and up, you can use Row_Number as part of your select, and only select where Row_Number is 1. I forget the exact syntax, but it's well documented... something along the lines of:
Select t0.ProductID,
t0.ProductName,
t0.Description,
t0.Category
Into tblCleanData
From (
Select ProductID,
ProductName,
Description,
Category,
Row_Number() Over (
Partition By ProductID,
ProductName,
Description,
Category
Order By ProductID,
ProductName,
Description,
Category
) As RowNumber
From MyTable
) As t0
Where t0.RowNumber = 1
Check out http://msdn.microsoft.com/en-us/library/ms186734.aspx, that should get you going in the right direction.
First use a SELECT... INTO
:
SELECT DISTINCT ProductID, ProductName, Description, Category
INTO tblProductClean
FROM tblProduct
The drop the first table.
精彩评论