Fixing duplicate rows in a table
I have a table like below
DECLARE @ProductTotals TABLE
(
id int,
value nvarchar(50)
)
which has following v开发者_如何学Pythonalue
1, 'abc'
2, 'abc'
1, 'abc'
3, 'abc'
I want to update this table so that it has the following values
1, 'abc'
2, 'abc_1'
1, 'abc'
3, 'abc_2'
Could someone help me out with this
Use a cursor to move over the table and try to insert every row in a second temporary table. If you get a collision (technically with a select
), you can run a second query to get the maximum number (if any) that's appended to your item.
Once you know what maximum number is used (use isnull
to cover the case of the first duplicate) just run an update over your original table and keep going with your scan.
Are you looking to remove duplicates? or just change the values so they aren't duplicate?
to change the values use
update producttotals set value = 'abc_1' where id =2;
update producttotals set value = 'abc_2' where id =3;
to find duplicate rows do a select id, value from producttotals group by id, value having count() > 2;
Assuming SQL Server 2005 or greater
DECLARE @ProductTotals TABLE
(
id int,
value nvarchar(50)
)
INSERT INTO @ProductTotals
VALUES (1, 'abc'),
(2, 'abc'),
(1, 'abc'),
(3, 'abc')
;WITH CTE as
(SELECT
ROW_NUMBER() OVER (Partition by value order by id) rn,
id,
value
FROM
@ProductTotals),
new_values as (
SELECT
pt.id,
pt.value,
pt.value + '_' + CAST( ROW_NUMBER() OVER (partition by pt.value order by pt.id) as varchar) new_value
FROM
@ProductTotals pt
INNER JOIN CTE
ON pt.id = CTE.id
and pt.value = CTE.value
WHERE
pt.id NOT IN (SELECT id FROM CTE WHERE rn = 1)) --remove any with the lowest ID for the value
UPDATE
@ProductTotals
SET
pt.value = nv.new_value
FROM
@ProductTotals pt
inner join new_values nv
ON pt.id = nv.id and pt.value = nv.value
SELECT * FROM @ProductTotals
Will produce the following
id value
----------- --------------------------------------------------
1 abc
2 abc_1
1 abc
3 abc_2
Explanation of the SQL
The first CTE creates a row number Value. So the numbering gets restarted whenever it sees a new value
rn id value
-------------------- ----------- --------
1 1 abc
2 1 abc
3 2 abc
4 3 abc
The second CTE called new_values ignores any IDs that are assoicated with with a RN of 1. So rn 1 and rn 2 get removed because they share the same ID. It also uses ROW_NUMBER() again to determine the number for the new_value
id value new_value
----------- ------ -------------
2 abc abc_1
3 abc abc_2
The final statement just updates the Old value with the new value
精彩评论