Cleaning up bad comma-separated IDs
Following on from a previous question, I'm trying to clean up some data where IDs are stored as a comma-separated list of values. I need to have these broken out into separate rows. I have what a query that works, but is rather slow. Do you have any ideas that would faster than what I'm doing?
SET NOCOUNT OFF
DECLARE @Conversion TABLE
(
ID bigint
, LogSearch_ID int
, LogSearchDimension_ID int
, SearchValue varchar(MAX)
)
DECLARE @RowsUpdated int, @MaxRows int, @NumUpdates int;
SET @MaxRows = 500;
SET @NumUpdates = 0;
SET @RowsUpdated = 1;
WHILE @RowsUpdated > 0 AND @NumUpdates < @MaxRows
BEGIN
INSERT INTO @Conversion (ID, LogSearch_ID, LogSearchDimension_ID, SearchValue )
SELECT TOP 1
ID, LogSearch_ID, LogSearchDimension_ID, SearchValue
FROM LogSearchesDimensions (NOLOCK)
WHERE LogSearchDimension_ID = 5 AND SearchValue LIKE '%,%';
INSERT INTO LogSearchesDimensions (LogSearch_ID, LogSearchDimension_ID, SearchValue)
SELECT
LogSearch_ID
, LogSearchDimension_ID
, s
FROM
@Conversion
-- The split function returns a table value with each item as a row in column 's'
dbo.Split((SELECT SearchValue FROM @Conversion), 0, 0);
SET @RowsUpdated = @@rowcount;
SET @NumUpdates = @NumUpdates + 1;
DELETE FROM LogSearchesDimensions WHERE ID = (SELECT ID FROM @Conversion)
DELETE FROM @Conversion;
END
The split function looks like this (I didn't write it myself):
CREATE FUNCTION SPLIT
(
@s nvarchar(max),
@trimPieces bit,
@returnEmptyStrings bit
)
returns @t table (val nvarchar(max))
as
begin
declare @i int, @j int
select @i = 0, @j = (len(@s) - len(replace(@s,',','')))
;with cte
as
(
select
i = @i + 1,
s = @s,
n = substring(@s, 0, charindex(',', @s)),
m = substring(@s, charindex(',', @s)+1, len(@s) - charindex(',', @s))
union all
select
i = cte.i + 1,
s = cte.m,
n = substring(cte.m, 0, charindex(',', cte.m)),
m = substring(
cte.m,
charindex(',', cte.m) + 1,
len(cte.m)-charindex(',', cte.m)
)
from cte
where i <= @j
)
insert into @t (val)
select pieces
from
(
select
case
when @trimPieces = 1
then ltrim(rtrim(case when i <= @j then n else m end))
else case when i <= @j then n else m end
end as pieces
from cte
) t
where
(@returnEmptyStrings = 0 and len(pieces) > 0)
or (@returnEmptyStrings = 1)
option (maxrecursion 0)
return
end
GO
开发者_如何学GoSo what the query is doing is grabbing a single row that has a comma separate value in it, breaking it out into multiple rows, insert it back into the dimensions table, and then deleting the original row. It's taking forever to go through and run the updates. Do you have any suggestions for improvement?
Here's the final solution I settled on. Not terribly fast, but stable and faster than doing all of the looping to split strings.
SET NOCOUNT ON
DECLARE @RowsUpdated int, @MaxRows int, @NumUpdates int, @SQL varchar(max);
SET @MaxRows = 100;
SET @NumUpdates = 0;
SET @RowsUpdated = 1;
WHILE @RowsUpdated > 0 AND @NumUpdates < @MaxRows
BEGIN
BEGIN TRANSACTION
SET @SQL = (
SELECT TOP 1
'INSERT INTO LogSearchesDimensions (SearchValue, LogSearch_ID, LogSearchDimension_ID) SELECT '
+ REPLACE(SearchValue, ',', ', ' + Cast(LogSearch_ID AS varchar) + ', ' + CAST(LogSearchDimension_ID AS varchar) + ' UNION ALL SELECT ')
+ ', ' + Cast(LogSearch_ID AS varchar) + ', ' + CAST(LogSearchDimension_ID AS varchar) + ';'
+ 'DELETE FROM LogSearchesDimensions WHERE ID = ' + CAST(ID AS varchar) + ';' AS SQL
FROM LogSearchesDimensions (NOLOCK)
WHERE LogSearchDimension_ID = 5 AND SearchValue LIKE '%,%'
)
SET @RowsUpdated = @@rowcount;
IF @RowsUpdated = 0
BREAK
SET @NumUpdates = @NumUpdates + 1;
COMMIT
END
Instead of a split inside your cursor through the table, try something like this:
DECLARE @sql varchar(MAX);
SELECT @sql = 'insert into mytable(id, otherfield1, otherfield2) select '
+ REPLACE(@idfield, ',', ', ' + @otherfield1 + ', ' + @otherfield2 union all select ')
EXEC(@SQL);
Then, after the cursor finishes working through rows that have comma-separated values, a simple delete statement.
This assumes otherfield
and otherfield2
are numeric, otherwise you'll need to do some escaping in that dynamic SQL.
Doing the SPLITing in SQL will be slow. Have you considered exporting the data to a flat file and using an SSIS package to re-import?
精彩评论