开发者

Cleaning up bad comma-separated IDs

Following on from a previous question, I'm trying to clean up some data where IDs are stored as a comma-separated list of values. I need to have these broken out into separate rows. I have what a query that works, but is rather slow. Do you have any ideas that would faster than what I'm doing?

SET NOCOUNT OFF
DECLARE @Conversion TABLE
(
    ID bigint
    , LogSearch_ID int
    , LogSearchDimension_ID int
    , SearchValue varchar(MAX)
)
DECLARE @RowsUpdated int, @MaxRows int, @NumUpdates int;
SET @MaxRows = 500;
SET @NumUpdates = 0;
SET @RowsUpdated = 1;
WHILE @RowsUpdated > 0 AND @NumUpdates < @MaxRows
BEGIN
    INSERT INTO @Conversion (ID, LogSearch_ID, LogSearchDimension_ID, SearchValue )
    SELECT TOP 1
        ID, LogSearch_ID, LogSearchDimension_ID, SearchValue
        FROM LogSearchesDimensions (NOLOCK)
        WHERE LogSearchDimension_ID = 5 AND SearchValue LIKE '%,%';

    INSERT INTO LogSearchesDimensions (LogSearch_ID, LogSearchDimension_ID, SearchValue)
    SELECT 
        LogSearch_ID
        , LogSearchDimension_ID 
        , s
    FROM 
        @Conversion
    -- The split function returns a table value with each item as a row in column 's'
    dbo.Split((SELECT SearchValue FROM @Conversion), 0, 0);

    SET @RowsUpdated = @@rowcount;
    SET @NumUpdates = @NumUpdates + 1;
    DELETE FROM LogSearchesDimensions WHERE ID = (SELECT ID FROM @Conversion)
    DELETE FROM @Conversion;

END

The split function looks like this (I didn't write it myself):

CREATE FUNCTION SPLIT
(
  @s nvarchar(max),
  @trimPieces bit,
  @returnEmptyStrings bit
)
returns @t table (val nvarchar(max))
as
begin

declare @i int, @j int
select @i = 0, @j = (len(@s) - len(replace(@s,',','')))

;with cte
as
(
  select
    i = @i + 1,
    s = @s,
    n = substring(@s, 0, charindex(',', @s)),
    m = substring(@s, charindex(',', @s)+1, len(@s) - charindex(',', @s))

  union all

  select
    i = cte.i + 1,
    s = cte.m,
    n = substring(cte.m, 0, charindex(',', cte.m)),
    m = substring(
      cte.m,
      charindex(',', cte.m) + 1,
      len(cte.m)-charindex(',', cte.m)
    )
  from cte
  where i <= @j
)
insert into @t (val)
select pieces
from
(
  select
  case
    when @trimPieces = 1
    then ltrim(rtrim(case when i <= @j then n else m end))
    else case when i <= @j then n else m end
  end as pieces
  from cte
) t
where
  (@returnEmptyStrings = 0 and len(pieces) > 0)
  or (@returnEmptyStrings = 1)
option (maxrecursion 0)

return

end

GO
开发者_如何学Go

So what the query is doing is grabbing a single row that has a comma separate value in it, breaking it out into multiple rows, insert it back into the dimensions table, and then deleting the original row. It's taking forever to go through and run the updates. Do you have any suggestions for improvement?


Here's the final solution I settled on. Not terribly fast, but stable and faster than doing all of the looping to split strings.

SET NOCOUNT ON
DECLARE @RowsUpdated int, @MaxRows int, @NumUpdates int, @SQL varchar(max);
SET @MaxRows = 100;
SET @NumUpdates = 0;
SET @RowsUpdated = 1;
WHILE @RowsUpdated > 0 AND @NumUpdates < @MaxRows
BEGIN
    BEGIN TRANSACTION
        SET @SQL = (
        SELECT TOP 1
            'INSERT INTO LogSearchesDimensions (SearchValue, LogSearch_ID, LogSearchDimension_ID) SELECT ' 
            + REPLACE(SearchValue, ',', ', ' + Cast(LogSearch_ID AS varchar) + ', ' + CAST(LogSearchDimension_ID AS varchar) + ' UNION ALL SELECT ') 
            + ', ' + Cast(LogSearch_ID AS varchar) + ', ' + CAST(LogSearchDimension_ID AS varchar) + ';'
            + 'DELETE FROM LogSearchesDimensions WHERE ID = ' + CAST(ID AS varchar) + ';' AS SQL
            FROM LogSearchesDimensions (NOLOCK)
            WHERE LogSearchDimension_ID = 5 AND SearchValue LIKE '%,%'
        )
        SET @RowsUpdated = @@rowcount;
        IF @RowsUpdated = 0
            BREAK

        SET @NumUpdates = @NumUpdates + 1;

    COMMIT
END


Instead of a split inside your cursor through the table, try something like this:

DECLARE @sql varchar(MAX);
SELECT @sql = 'insert into mytable(id, otherfield1, otherfield2) select '
  + REPLACE(@idfield, ',', ', ' + @otherfield1 + ', ' + @otherfield2 union all select ')
EXEC(@SQL);

Then, after the cursor finishes working through rows that have comma-separated values, a simple delete statement.

This assumes otherfield and otherfield2 are numeric, otherwise you'll need to do some escaping in that dynamic SQL.


Doing the SPLITing in SQL will be slow. Have you considered exporting the data to a flat file and using an SSIS package to re-import?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜