Can this row-by-row T-SQL script be converted to a set-based script?

2023-01-15 03:20 问答作者：

I have written this T-SQL script to roll up duplicate rows in a database that are created by a reconstruction process. To do this, it performs the following:

Gets and stores the minimum target table primary key (ColTargetPK) per set of duplicate entries in a table variable (@minColTargetPKTable); determining duplicates by matching on three columns - ColIntA, ColIntB and ColDateTimeA.
Sets the Target Column to be rolled up (TargetColVarchar) to equal a concatenation of the target column of the corresponding duplicate entries.
Flags the duplicate entries as inactive (ColTargetStatus = 0)
Reports success (or failure)

Due to the size of the dataset involved, this script takes an inappopriate length of time to run.

Can anyone see how this can be converted to be set-based, if so, could you please provide an example?

I apologise in advance if my description is a bit confusing...

declare @MinColTargetPKTable table
    (ColIntA int,
     ColIntB int,
     ColDateTimeA nvarchar(25),
     minColTargetPK int
    )

insert @minColTargetPKtable
    select ColIntA, ColIntB, convert(nvarchar(25),ColDateTimeA,120) as ColDateTimeA, 
    min(ColTargetPK) as MinColTargetPK from TargetColTable
    group by ColIntA, ColIntB, convert(nvarchar(25),ColDateTimeA,120) 


declare @TargetColVarchar varchar(max)

declare @updatedColTargetPKs table
(updatedColTargetPKs int)

declare @minColTargetPK int

declare cur cursor
for
select minColTargetPK
from @minColTargetPKtable

open cur

fetch next from cur into @minColTargetPK

while @@FETCH_STATUS = 0
begin
    begin try

    set @TargetColVarchar =
        convert(nvarchar(max),(
        select replace(convert(nvarchar(max), isnull(TargetColVarchar,'')) +   convert (nvarchar(max),' \par \par \par'), '\par } ', '\par') as 
        TargetColVarchar
        from TargetColTable v1
        where ColIntA = (select ColIntA from TargetColTable where ColTargetPK = @minColTargetPK)
        and ColIntB = (select ColIntB from TargetColTable where ColTargetPK = @minColTargetPK)
        and convert(nvarchar(25),ColDateTimeA,120) = (select convert(nvarchar(25),ColDateTimeA,120) from TargetColTable where ColTargetPK = @minColTargetPK)
        order by ColTargetPK
        for xml path(''), type  
        ))

        set @TargetColVarchar = REPLACE(REPLACE (REPLACE (@TargetColVarchar,'<TargetColVarchar>',''),'</TargetColVarchar>',''), '&#x0D;','')


        update TargetColTable
        set TargetColVarchar = @TargetColVarchar
        where ColTargetPK = @minColTargetPK

        update TargetColTable
        set ColTargetStatus = 0
        from TargetColTable v1
        where ColIntA = (select ColIntA from TargetColTable where ColTargetPK = @minColTargetPK)
        and ColIntB = (select ColIntB from TargetColTable where ColTargetPK = @minColTargetPK)
        and convert(nvarchar(25),ColDateTimeA,120) = (select convert(nvarchar(25),ColDateTimeA,120) from TargetColTable where ColTargetPK = @minColTargetPK)
        and ColTargetPK != @minColTargetPK

        Print 'Merge complete for ColTargetPK '+ convert(varchar(50),  @minColTargetPK)

    end try
    begin catch

        Print 'Merge failed for ColTargetPK '+ convert (varchar(20),@minColTargetPK)

    end catch

    fetch next from cur into @minColTargetPK
end

close cur
deallocate cur

EDIT: Ok, below is the script moved to a set-based operation using Preet's suggestion. To give some additional background, TargetTable is approximately 1.1 million rows. Strangely enough, the set-based script below is not significantly faster than the cursor-based script below on the same subset of data (approx. 20000 rows) over 2 trials. Any thoughts on why this wouldn't be faster?

declare @minColTargetPKTable table
    (
    ColIntA int,
    ColIntB int,
    ColDateTimeA nvarchar(25),
    ColTargetPK int,
    concTargetCol varchar(max)
    )

insert @minColTargetPKtable (minColIntA,ColIntB,minColDateTimeA,minColTargetPK)
select ColIntA, ColIntB, convert(nvarchar(25),ColDateTimeA,120) as ColDateTimeA, min(ColTargetPK) as minColTargetPK from TargetTable
group by ColIntA, ColIntB, convert(nvarchar(25),ColDateTimeA,120) 


update @minColTargetPKTable 
set concTargetCol  = 
(REPLACE(REPLACE(REPLACE(replace(convert(nvarchar(max),
    (
        select convert(nvarchar(max), isnull(TargetColVarchar,'')) + convert (nvarchar(max),' \par \par \par ') as 
        TargetColVarchar
        from TargetTable v1
        where ColIntA = (select ColIntA from TargetTable where ColTargetPK = minColTargetPK)
        and ColIntB = (select ColIntB from TargetTable where ColTargetPK = minColTargetPK)
        and convert(nvarchar(25),ColDateTimeA,120) = (select convert(nvarchar(25),ColDateTimeA,120) from TargetTable where ColTargetPK = minColTargetPK)
        order by ColTargetPK
        for xml path(''), type  
    ))
, '\par } ', '\par '),'<TargetColVarchar>',''),'</TargetColVarchar>',''), '&#x0D;',''))

update TargetTable 
set TargetColVarchar = mv.concTargetCol
from @minColTargetPKTable mv
w开发者_如何转开发here mv.minColTargetPK = TargetTable.ColTargetPK 

update TargetTable 
set TargetColStatus = 0 
from TargetTable v
inner join @minColTargetPKTable mv on
mv.minColIntA = v.ColIntA
and mv.minColDateTimeA = convert(nvarchar(25),v.ColDateTimeA,120)
and mv.ColIntB = v.ColIntB 
where ColTargetPK not in (select minColTargetPK from @minColTargetPKTable)

Ok the I'd sugggest the following:

Add an extra column to the temp table to @TargetColVarchar value, do this one hit
Join the temp table and TargetColTable to the do the update

You can then optimise based on the execution plans

Update:

Looking at your amended results, I'd say the following is in order:

use a #temp table, these tend to be more performant on large datasets.

a. add more columns to the temp table to record things like : (select ColIntA from TargetColTable where ColTargetPK = @minColTargetPK and (select ColIntB from TargetColTable where ColTargetPK = @minColTargetPK in the big hit up front

b. The string replace is slow I reckon. This will still be slow. I know XML is not the fastest thing in the world. Can you replace the string comp with SQL XML specfic code

c. In the second update at the bottom the where ColTargetPK not in (select minColTargetPK from @minColTargetPKTable) is likely to be slower than a precise join, and you should do both the updated in one hit

However use the Actual Query Plan to work this out.

Try to do this to your first query

    from tct v1
    join tct v2 on v2.pk = @pk
    where v1.a = v2.a and v1.b = v2.b and v1.dt = v2.dt

and this to your second query

    from tct v1
    join tct v2 on v2.pk = @pk and v1.pk <> @pk
    where v1.a = v2.a and v1.b = v2.b and v1.dt = v2.dt

继续阅读：sql sql-server-2008

Can this row-by-row T-SQL script be converted to a set-based script?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？