开发者

Fixing duplicate rows in a table

I have a table like below

DECLARE @ProductTotals TABLE 
(
  id int, 

  value nvarchar(50)
)

which has following v开发者_如何学Pythonalue

1, 'abc'
2, 'abc'
1, 'abc'
3, 'abc'

I want to update this table so that it has the following values

1, 'abc'
2, 'abc_1'
1, 'abc'
3, 'abc_2'

Could someone help me out with this


Use a cursor to move over the table and try to insert every row in a second temporary table. If you get a collision (technically with a select), you can run a second query to get the maximum number (if any) that's appended to your item.

Once you know what maximum number is used (use isnull to cover the case of the first duplicate) just run an update over your original table and keep going with your scan.


Are you looking to remove duplicates? or just change the values so they aren't duplicate?

to change the values use

update producttotals set value = 'abc_1' where id =2;

update producttotals set value = 'abc_2' where id =3;

to find duplicate rows do a select id, value from producttotals group by id, value having count() > 2;


Assuming SQL Server 2005 or greater

DECLARE @ProductTotals TABLE 
(
  id int, 

  value nvarchar(50)
)
INSERT INTO @ProductTotals
VALUES (1, 'abc'),
        (2, 'abc'),
        (1, 'abc'),
        (3, 'abc')


;WITH CTE as 
(SELECT 
    ROW_NUMBER() OVER (Partition by value order by id) rn,
    id,
    value
FROM
    @ProductTotals),
new_values  as (
SELECT
    pt.id,
    pt.value,
    pt.value + '_' +  CAST( ROW_NUMBER() OVER (partition by pt.value order by pt.id) as varchar) new_value



FROM
    @ProductTotals pt
    INNER JOIN CTE
    ON pt.id = CTE.id
     and pt.value = CTE.value   
WHERE
    pt.id NOT IN (SELECT id FROM CTE WHERE rn = 1)) --remove any with the lowest ID for the value

UPDATE
    @ProductTotals 
SET
    pt.value = nv.new_value
FROM 
@ProductTotals pt 
inner join new_values nv
ON pt.id = nv.id and pt.value = nv.value

SELECT * FROM @ProductTotals 

Will produce the following

id          value
----------- --------------------------------------------------
1           abc
2           abc_1
1           abc
3           abc_2

Explanation of the SQL

The first CTE creates a row number Value. So the numbering gets restarted whenever it sees a new value

rn                   id          value
-------------------- ----------- --------
1                    1           abc
2                    1           abc
3                    2           abc
4                    3           abc

The second CTE called new_values ignores any IDs that are assoicated with with a RN of 1. So rn 1 and rn 2 get removed because they share the same ID. It also uses ROW_NUMBER() again to determine the number for the new_value

id          value  new_value
----------- ------ -------------
2           abc    abc_1
3           abc    abc_2

The final statement just updates the Old value with the new value

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜