How to Replace Multiple Characters in SQL?
This is based on a similar question How to Replace Multiple Characters in Access SQL?
I wrote this since sql server 2005 seems to have a limit on replace() function to 19 replacements inside a where clause.
I have the following task: Need to perform a match on a column, and to improve the chances of a match stripping multiple un-needed chars using replace() function
DECLARE @es NVarChar(1) SET @es = ''
DECLARE @p0 NVarChar(1) SET @p0 = '!'
DECLARE @p1 NVarChar(1) SET @p1 = '@'
---etc...
SELECT *
FROM t1,t2
WHERE REPLACE(REPLACE(t1.stringkey,@p0, @es), @p1, @es)
= REPLACE(REPLACE(t2.stringkey,@p0, @es), @p1, @es)
---etc
If there are >19 REPLACE() in that where clause, it doesn't work. So the solution I came up with is to create a sql function called trimChars in this example (excuse them starting at @22
CREATE FUNCTION [trimChars] (
@string varchar(max)
)
RETURNS varchar(max)
AS
BEGIN
DECLARE @es NVarChar(1) SET @es = ''
DECLARE @p22 NVarChar(1) SET @p22 = '^'
DECLARE @p23 NVarChar(1) SET @p23 = '&'
DECLARE @p24 NVarChar(1) SET @p24 = '*'
DECLARE @p25 NVarChar(1) SET @p25 = '('
DECLARE @p26 NVarChar(1) SET @p26 = '_'
DECLARE @p27 NVarChar(1) SET @p27 = ')'
DECLARE @p28 NVarChar(1) SET @p28 = '`'
DECLARE @p29 NVarChar(1) SET @p29 = '~'
DECLARE @p30 NVarChar(1) SET @p30 = '{'
DECLARE @p31 NVarChar(1) SET @p31 = '}'
DECLARE @p32 NVarChar(1) SET @p32 = ' '
DECLARE @p33 NVarChar(1) SET @p33 = '['
DECLARE @p34 NVarChar(1) SET @p34 = '?'
DECLARE @p35 NVarChar(1) SET @p35 = ']'
DECLARE @p36 NVarChar(1) SET @p36 = '\'
DECLARE @p37 NVarChar(1) SET @p37 = '|'
DECLARE @p38 NVarChar(1) SET @p38 = '<'
DECLARE @p39 NVarChar(1) SET @p39 = '>'
DECLARE @p40 NVarChar(1) SET @p40 = '@'
DECLARE @p41 NVarChar(1) SET @p41 = '-'
return REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
@string, @p22, @es), @p23, @es), @p24, @es), @p25, @es), @p26, @es), @p27, @es), @p28, @es), @p29, @es), @p30, @es), @p31, @es), @p32, @es), @p33, @es), @p34, @es), @p35, @es),开发者_Python百科 @p36, @es), @p37, @es), @p38, @es), @p39, @es), @p40, @es), @p41, @es)
END
This can then be used in addition to the other replace strings
SELECT *
FROM t1,t2
WHERE trimChars(REPLACE(REPLACE(t1.stringkey,@p0, @es), @p1, @es)
= REPLACE(REPLACE(t2.stringkey,@p0, @es), @p1, @es))
I created a few more functions to do similar replacing like so trimChars(trimMoreChars(
SELECT *
FROM t1,t2
WHERE trimChars(trimMoreChars(REPLACE(REPLACE(t1.stringkey,@p0, @es), @p1, @es)
= REPLACE(REPLACE(t2.stringkey,@p0, @es), @p1, @es)))
Can someone give me a better solution to this problem in terms of performance and maybe a cleaner implementation?
One useful trick in SQL is the ability use @var = function(...)
to assign a value. If you have multiple records in your record set, your var is assigned multiple times with side-effects:
declare @badStrings table (item varchar(50))
INSERT INTO @badStrings(item)
SELECT '>' UNION ALL
SELECT '<' UNION ALL
SELECT '(' UNION ALL
SELECT ')' UNION ALL
SELECT '!' UNION ALL
SELECT '?' UNION ALL
SELECT '@'
declare @testString varchar(100), @newString varchar(100)
set @teststring = 'Juliet ro><0zs my s0x()rz!!?!one!@!@!@!'
set @newString = @testString
SELECT @newString = Replace(@newString, item, '') FROM @badStrings
select @newString -- returns 'Juliet ro0zs my s0xrzone'
I would seriously consider making a CLR UDF instead and using regular expressions (both the string and the pattern can be passed in as parameters) to do a complete search and replace for a range of characters. It should easily outperform this SQL UDF.
I really like @Juliett's solution! I would just use a CTE to get all the invalid characters:
DECLARE @badStrings VARCHAR(100)
DECLARE @teststring VARCHAR(100)
SET @badStrings = '><()!?@'
SET @teststring = 'Juliet ro><0zs my s0x()rz!!?!one!@!@!@!'
;WITH CTE AS
(
SELECT SUBSTRING(@badStrings, 1, 1) AS [String], 1 AS [Start], 1 AS [Counter]
UNION ALL
SELECT SUBSTRING(@badStrings, [Start] + 1, 1) AS [String], [Start] + 1, [Counter] + 1
FROM CTE
WHERE [Counter] < LEN(@badStrings)
)
SELECT @teststring = REPLACE(@teststring, CTE.[String], '') FROM CTE
SELECT @teststring
Juliet ro0zs my s0xrzone
I suggest you to create a scalar user defined function. This is an example (sorry in advance, because the variable names are in spanish):
CREATE FUNCTION [dbo].[Udf_ReplaceChars] (
@cadena VARCHAR(500), -- String to manipulate
@caracteresElim VARCHAR(100), -- String of characters to be replaced
@caracteresReem VARCHAR(100) -- String of characters for replacement
)
RETURNS VARCHAR(500)
AS
BEGIN
DECLARE @cadenaFinal VARCHAR(500), @longCad INT, @pos INT, @caracter CHAR(1), @posCarER INT;
SELECT
@cadenaFinal = '',
@longCad = LEN(@cadena),
@pos = 1;
IF LEN(@caracteresElim)<>LEN(@caracteresReem)
BEGIN
RETURN NULL;
END
WHILE @pos <= @longCad
BEGIN
SELECT
@caracter = SUBSTRING(@cadena,@pos,1),
@pos = @pos + 1,
@posCarER = CHARINDEX(@caracter,@caracteresElim);
IF @posCarER <= 0
BEGIN
SET @cadenaFinal = @cadenaFinal + @caracter;
END
ELSE
BEGIN
SET @cadenaFinal = @cadenaFinal + SUBSTRING(@caracteresReem,@posCarER,1)
END
END
RETURN @cadenaFinal;
END
Here is an example using this function:
SELECT dbo.Udf_ReplaceChars('This is a test.','sat','Z47');
And the result is: 7hiZ iZ 4 7eZ7.
As you can see, each character of the @caracteresElim
parameter is replaced by the character in the same position from the @caracteresReem
parameter.
While this question was asked about SQL Server 2005, it's worth noting that as of Sql Server 2017, the request can be done with the new TRANSLATE function.
https://learn.microsoft.com/en-us/sql/t-sql/functions/translate-transact-sql
I hope this information helps people who get to this page in the future.
I had a one-off data migration issue where the source data could not output correctly some unusual/technical characters plus the ubiquitous extra commas in CSVs.
We decided that for each such character the source extract should replace them with something that was recognisable to both the source system and the SQL Server that was loading them but which would not be in the data otherwise.
It did mean however that in various columns across various tables these replacement characters would appear and I would have to replace them. Nesting multiple REPLACE functions made the import code look scary and prone to errors in misjudging the placement and number of brackets so I wrote the following function. I know it can process a column in a table of 3,000 rows in less than a second though I'm not sure how quickly it will scale up to multi-million row tables.
create function [dbo].[udf_ReplaceMultipleChars]
(
@OriginalString nvarchar(4000)
, @ReplaceTheseChars nvarchar(100)
, @LengthOfReplacement int = 1
)
returns nvarchar(4000)
begin
declare @RevisedString nvarchar(4000) = N'';
declare @lengthofinput int =
(
select len(@OriginalString)
);
with AllNumbers
as (select 1 as Number
union all
select Number + 1
from AllNumbers
where Number < @lengthofinput)
select @RevisedString += case
when (charindex(substring(@OriginalString, Number, 1), @ReplaceTheseChars, 1) - 1) % 2
= 0 then
substring(
@ReplaceTheseChars
, charindex(
substring(@OriginalString, Number, 1)
, @ReplaceTheseChars
, 1
) + 1
, @LengthOfReplacement
)
else
substring(@OriginalString, Number, 1)
end
from AllNumbers
option (maxrecursion 4000);
return (@RevisedString);
end;
It works by submitting both the string to be evaluated and have characters to be replaced (@OriginalString) along with a string of paired characters where the first character is to be replaced by the second, the third by the fourth, fifth by sixth and so on (@ReplaceTheseChars).
Here is the string of chars that I needed to replace and their replacements... [']"~,{Ø}°$±|¼¦¼ª½¬½^¾#✓
i.e. A opening square bracket denotes an apostrophe, a closing one a double quote. You can see that there were vulgar fractions as well as degrees and diameter symbols in there.
There is a default @LengthOfReplacement that is included as a starting point if anyone needed to replace longer strings. I played around with that in my project but the single char replacement was the main function.
The condition of the case statement is important. It ensures that it only replaces the character if it is found in your @ReplaceTheseChars variable and that the character has to be found in an odd numbered position (the minus 1 from charindex result ensures that anything NOT found returns a negative modulo value). i.e if you find a tilde (~) in position 5 it will replace it with a comma but if on a subsequent run it found the comma in position 6 it would not replace it with a curly bracket ({).
This can be best demonstrated with an example...
declare @ProductDescription nvarchar(20) = N'abc~def[¦][123';
select @ProductDescription
= dbo.udf_ReplaceMultipleChars(
@ProductDescription
/* NB the doubling up of the apostrophe is necessary in the string but resolves to a single apostrophe when passed to the function */
,'['']"~,{Ø}°$±|¼¦¼ª½¬½^¾#✓'
, default
);
select @ProductDescription
, dbo.udf_ReplaceMultipleChars(
@ProductDescription
,'['']"~,{Ø}°$±|¼¦¼ª½¬½^¾#✓'
/* if you didn't know how to type those peculiar chars in then you can build a string like this... '[' + nchar(0x0027) + ']"~,{' + nchar(0x00D8) + '}' + nchar(0x00B0) etc */
,
default
);
This will return both the value after the first pass through the function and the second time as follows... abc,def'¼"'123 abc,def'¼"'123
A table update would just be
update a
set a.Col1 = udf.ReplaceMultipleChars(a.Col1,'~,]"',1)
from TestTable a
Finally (I hear you say!), although I've not had access to the translate function I believe that this function can process the example shown in the documentation quite easily. The TRANSLATE function demo is
SELECT TRANSLATE('2*[3+4]/{7-2}', '[]{}', '()()');
which returns 2*(3+4)/(7-2) although I understand it might not work on 2*[3+4]/[7-2] !!
My function would approach this as follows listing each char to be replaced followed by its replacement [ --> (, { --> ( etc.
select dbo.udf_ReplaceMultipleChars('2*[3+4]/{7-2}', '[({(])})', 1);
which will also work for
select dbo.udf_ReplaceMultipleChars('2*[3+4]/[7-2]', '[({(])})', 1);
I hope someone finds this useful and if you get to test its performance against larger tables do let us know one way or another!
declare @testVal varchar(20)
set @testVal = '?t/es?ti/n*g 1*2?3*'
select @testVal = REPLACE(@testVal, item, '') from (select '?' item union select '*' union select '/') list
select @testVal;
One option is to use a numbers/tally table to drive an iterative process via a pseudo-set based query.
The general idea of char replacement can be demonstrated with a simple character map table approach:
create table charMap (srcChar char(1), replaceChar char(1))
insert charMap values ('a', 'z')
insert charMap values ('b', 'y')
create table testChar(srcChar char(1))
insert testChar values ('1')
insert testChar values ('a')
insert testChar values ('2')
insert testChar values ('b')
select
coalesce(charMap.replaceChar, testChar.srcChar) as charData
from testChar left join charMap on testChar.srcChar = charMap.srcChar
Then you can bring in the tally table approach to do the lookup on each character position in the string.
create table tally (i int)
declare @i int
set @i = 1
while @i <= 256 begin
insert tally values (@i)
set @i = @i + 1
end
create table testData (testString char(10))
insert testData values ('123a456')
insert testData values ('123ab456')
insert testData values ('123b456')
select
i,
SUBSTRING(testString, i, 1) as srcChar,
coalesce(charMap.replaceChar, SUBSTRING(testString, i, 1)) as charData
from testData cross join tally
left join charMap on SUBSTRING(testString, i, 1) = charMap.srcChar
where i <= LEN(testString)
I don't know why Charles Bretana deleted his answer, so I'm adding it back in as a CW answer, but a persisted computed column is a REALLY good way to handle these cases where you need cleansed or transformed data almost all the time, but need to preserve the original garbage. His suggestion is relevant and appropriate REGARDLESS of how you decide to cleanse your data.
Specifically, in my current project, I have a persisted computed column which trims all the leading zeros (luckily this is realtively easily handled in straight T-SQL) from some particular numeric identifiers stored inconsistently with leading zeros. This is stored in persisted computed columns in the tables which need it and indexed because that conformed identifier is often used in joins.
Here are the steps
- Create a CLR function
See following code:
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString Replace2(SqlString inputtext, SqlString filter,SqlString replacewith)
{
string str = inputtext.ToString();
try
{
string pattern = (string)filter;
string replacement = (string)replacewith;
Regex rgx = new Regex(pattern);
string result = rgx.Replace(str, replacement);
return (SqlString)result;
}
catch (Exception s)
{
return (SqlString)s.Message;
}
}
}
Deploy your CLR function
Now Test it
See following code:
create table dbo.test(dummydata varchar(255))
Go
INSERT INTO dbo.test values('P@ssw1rd'),('This 12is @test')
Go
Update dbo.test
set dummydata=dbo.Replace2(dummydata,'[0-9@]','')
select * from dbo.test
dummydata, Psswrd, This is test booom!!!!!!!!!!!!!
Here's a modern solution using STRING_SPLIT that's very concise. The drawback is that you need at least version SQL Server 2016 running at compatibility level 130.
Declare @strOriginal varchar(100) = 'Juliet ro><0zs my s0x()rz!!?!one!@!@!@!'
Declare @strModified varchar(100) = @strOriginal
Declare @disallowed varchar(100) = '> < ( ) ! ? @'
Select
@strModified = Replace(@strModified, value, '')
From
String_Split(@disallowed,' ')
Select @strModified
It returns:
Juliet ro0zs my s0xrzone
create function RemoveCharacters(@original nvarchar(max) , @badchars nvarchar(max))
returns nvarchar(max)
as
begin
declare @len int = (select len(@badchars))
return REPLACE(TRANSLATE(@original, @badchars, replicate('#' , @len )), '#', '')
end
go
select dbo.RemoveCharacters('Hello World!' , 'lo!' )
--returns He Wrd
精彩评论