Cleaning up bad comma-separated IDs

2022-12-13 17:46 问答作者：

Following on from a previous question, I'm trying to clean up some data where IDs are stored as a comma-separated list of values. I need to have these broken out into separate rows. I have what a query that works, but is rather slow. Do you have any ideas that would faster than what I'm doing?

SET NOCOUNT OFF
DECLARE @Conversion TABLE
(
    ID bigint
    , LogSearch_ID int
    , LogSearchDimension_ID int
    , SearchValue varchar(MAX)
)
DECLARE @RowsUpdated int, @MaxRows int, @NumUpdates int;
SET @MaxRows = 500;
SET @NumUpdates = 0;
SET @RowsUpdated = 1;
WHILE @RowsUpdated > 0 AND @NumUpdates < @MaxRows
BEGIN
    INSERT INTO @Conversion (ID, LogSearch_ID, LogSearchDimension_ID, SearchValue )
    SELECT TOP 1
        ID, LogSearch_ID, LogSearchDimension_ID, SearchValue
        FROM LogSearchesDimensions (NOLOCK)
        WHERE LogSearchDimension_ID = 5 AND SearchValue LIKE '%,%';

    INSERT INTO LogSearchesDimensions (LogSearch_ID, LogSearchDimension_ID, SearchValue)
    SELECT 
        LogSearch_ID
        , LogSearchDimension_ID 
        , s
    FROM 
        @Conversion
    -- The split function returns a table value with each item as a row in column 's'
    dbo.Split((SELECT SearchValue FROM @Conversion), 0, 0);

    SET @RowsUpdated = @@rowcount;
    SET @NumUpdates = @NumUpdates + 1;
    DELETE FROM LogSearchesDimensions WHERE ID = (SELECT ID FROM @Conversion)
    DELETE FROM @Conversion;

END

The split function looks like this (I didn't write it myself):

CREATE FUNCTION SPLIT
(
  @s nvarchar(max),
  @trimPieces bit,
  @returnEmptyStrings bit
)
returns @t table (val nvarchar(max))
as
begin

declare @i int, @j int
select @i = 0, @j = (len(@s) - len(replace(@s,',','')))

;with cte
as
(
  select
    i = @i + 1,
    s = @s,
    n = substring(@s, 0, charindex(',', @s)),
    m = substring(@s, charindex(',', @s)+1, len(@s) - charindex(',', @s))

  union all

  select
    i = cte.i + 1,
    s = cte.m,
    n = substring(cte.m, 0, charindex(',', cte.m)),
    m = substring(
      cte.m,
      charindex(',', cte.m) + 1,
      len(cte.m)-charindex(',', cte.m)
    )
  from cte
  where i <= @j
)
insert into @t (val)
select pieces
from
(
  select
  case
    when @trimPieces = 1
    then ltrim(rtrim(case when i <= @j then n else m end))
    else case when i <= @j then n else m end
  end as pieces
  from cte
) t
where
  (@returnEmptyStrings = 0 and len(pieces) > 0)
  or (@returnEmptyStrings = 1)
option (maxrecursion 0)

return

end

GO

开发者_如何学Go

So what the query is doing is grabbing a single row that has a comma separate value in it, breaking it out into multiple rows, insert it back into the dimensions table, and then deleting the original row. It's taking forever to go through and run the updates. Do you have any suggestions for improvement?

Here's the final solution I settled on. Not terribly fast, but stable and faster than doing all of the looping to split strings.

SET NOCOUNT ON
DECLARE @RowsUpdated int, @MaxRows int, @NumUpdates int, @SQL varchar(max);
SET @MaxRows = 100;
SET @NumUpdates = 0;
SET @RowsUpdated = 1;
WHILE @RowsUpdated > 0 AND @NumUpdates < @MaxRows
BEGIN
    BEGIN TRANSACTION
        SET @SQL = (
        SELECT TOP 1
            'INSERT INTO LogSearchesDimensions (SearchValue, LogSearch_ID, LogSearchDimension_ID) SELECT ' 
            + REPLACE(SearchValue, ',', ', ' + Cast(LogSearch_ID AS varchar) + ', ' + CAST(LogSearchDimension_ID AS varchar) + ' UNION ALL SELECT ') 
            + ', ' + Cast(LogSearch_ID AS varchar) + ', ' + CAST(LogSearchDimension_ID AS varchar) + ';'
            + 'DELETE FROM LogSearchesDimensions WHERE ID = ' + CAST(ID AS varchar) + ';' AS SQL
            FROM LogSearchesDimensions (NOLOCK)
            WHERE LogSearchDimension_ID = 5 AND SearchValue LIKE '%,%'
        )
        SET @RowsUpdated = @@rowcount;
        IF @RowsUpdated = 0
            BREAK

        SET @NumUpdates = @NumUpdates + 1;

    COMMIT
END

Instead of a split inside your cursor through the table, try something like this:

DECLARE @sql varchar(MAX);
SELECT @sql = 'insert into mytable(id, otherfield1, otherfield2) select '
  + REPLACE(@idfield, ',', ', ' + @otherfield1 + ', ' + @otherfield2 union all select ')
EXEC(@SQL);

Then, after the cursor finishes working through rows that have comma-separated values, a simple delete statement.

This assumes otherfield and otherfield2 are numeric, otherwise you'll need to do some escaping in that dynamic SQL.

Doing the SPLITing in SQL will be slow. Have you considered exporting the data to a flat file and using an SSIS package to re-import?

继续阅读：sql-server

Cleaning up bad comma-separated IDs

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？