开发者

SQL Server: How do you remove punctuation from a field?

Any one know a good way to remove punctuation from a field in SQL Server?

I'm thinking

UPDATE tblMyTable SET FieldName = REPLACE(REPLACE(REPLACE(FieldName,',',''),'.',''开发者_C百科),'''' ,'')

but it seems a bit tedious when I intend on removing a large number of different characters for example: !@#$%^&*()<>:"

Thanks in advance


Ideally, you would do this in an application language such as C# + LINQ as mentioned above.

If you wanted to do it purely in T-SQL though, one way make things neater would be to firstly create a table that held all the punctuation you wanted to removed.

CREATE TABLE Punctuation 
(
    Symbol VARCHAR(1) NOT NULL
)

INSERT INTO Punctuation (Symbol) VALUES('''')
INSERT INTO Punctuation (Symbol) VALUES('-')
INSERT INTO Punctuation (Symbol) VALUES('.')

Next, you could create a function in SQL to remove all the punctuation symbols from an input string.

CREATE FUNCTION dbo.fn_RemovePunctuation
(
    @InputString VARCHAR(500)
)
RETURNS VARCHAR(500)
AS
BEGIN
    SELECT
        @InputString = REPLACE(@InputString, P.Symbol, '')
    FROM 
        Punctuation P

    RETURN @InputString
END
GO

Then you can just call the function in your UPDATE statement

UPDATE tblMyTable SET FieldName = dbo.fn_RemovePunctuation(FieldName)


I wanted to avoid creating a table and wanted to remove everything except letters and digits.

DECLARE @p int
DECLARE @Result Varchar(250)
DECLARE @BadChars Varchar(12)
SELECT @BadChars = '%[^a-z0-9]%'
-- to leave spaces - SELECT @BadChars = '%[^a-z0-9] %'

SET @Result = @InStr

SET @P =PatIndex(@BadChars,@Result)
WHILE @p > 0 BEGIN
    SELECT @Result = Left(@Result,@p-1) + Substring(@Result,@p+1,250)
    SET @P =PatIndex(@BadChars,@Result)
    END


I am proposing 2 solutions

Solution 1: Make a noise table and replace the noises with blank spaces

e.g.

DECLARE @String VARCHAR(MAX)
DECLARE @Noise TABLE(Noise VARCHAR(100),ReplaceChars VARCHAR(10))
SET @String = 'hello! how * > are % u (: . I am ok :). Oh nice!'

INSERT INTO @Noise(Noise,ReplaceChars)
SELECT '!',SPACE(1) UNION ALL SELECT '@',SPACE(1) UNION ALL
SELECT '#',SPACE(1) UNION ALL SELECT '$',SPACE(1) UNION ALL
SELECT '%',SPACE(1) UNION ALL SELECT '^',SPACE(1) UNION ALL
SELECT '&',SPACE(1) UNION ALL SELECT '*',SPACE(1) UNION ALL
SELECT '(',SPACE(1) UNION ALL SELECT ')',SPACE(1) UNION ALL
SELECT '{',SPACE(1) UNION ALL SELECT '}',SPACE(1) UNION ALL
SELECT '<',SPACE(1) UNION ALL SELECT '>',SPACE(1) UNION ALL
SELECT ':',SPACE(1)

SELECT @String = REPLACE(@String, Noise, ReplaceChars) FROM @Noise
SELECT @String Data

Solution 2: With a number table

DECLARE @String VARCHAR(MAX)
SET @String = 'hello! & how * > are % u (: . I am ok :). Oh nice!'

;with numbercte as
(
 select 1 as rn
 union all
 select rn+1 from numbercte where rn<LEN(@String)
)
select REPLACE(FilteredData,'&#x20;',SPACE(1)) Data from 
(select SUBSTRING(@String,rn,1) 
from numbercte  
where SUBSTRING(@String,rn,1) not in('!','*','>','<','%','(',')',':','!','&','@','#','$')

for xml path(''))X(FilteredData)

Output(Both the cases)

Data

hello  how   are  u  . I am ok . Oh nice

Note- I have just put some of the noises. You may need to put the noises that u need.

Hope this helps


You can use regular expressions in SQL Server - here is an article based on SQL 2005:

http://msdn.microsoft.com/en-us/magazine/cc163473.aspx


I'd wrap it in a simple scalar UDF so all string cleaning is in one place if it's needed again.

Then you can use it on INSERT too...


I took Ken MC's solution and made it into an function which can replace all punctuation with a given string:

----------------------------------------------------------------------------------------------------------------
-- This function replaces all punctuation in the given string with the "replaceWith" string
----------------------------------------------------------------------------------------------------------------
IF object_id('[dbo].[fnReplacePunctuation]') IS NOT NULL
BEGIN
    DROP FUNCTION [dbo].[fnReplacePunctuation];
END;
GO
CREATE FUNCTION [dbo].[fnReplacePunctuation] (@string NVARCHAR(MAX), @replaceWith NVARCHAR(max))
RETURNS NVARCHAR(MAX)
BEGIN
    DECLARE @Result Varchar(max) = @string;
    DECLARE @BadChars Varchar(12) = '%[^a-z0-9]%'; -- to leave spaces - SELECT @BadChars = '%[^a-z0-9] %'
    DECLARE @p int = PatIndex(@BadChars,@Result);
    DECLARE @searchFrom INT;
    DECLARE @indexOfPunct INT = @p;

    WHILE @indexOfPunct > 0 BEGIN
      SET @searchFrom = LEN(@Result) - @p;
      SET @Result = Left(@Result, @p-1) + @replaceWith + Substring(@Result, @p+1,LEN(@Result));
      SET @IndexOfPunct = PatIndex(@BadChars, substring(@Result, (LEN(@Result) - @SearchFrom)+1, LEN(@Result)));
      SET @p = (LEN(@Result) - @searchFrom) + @indexOfPunct;
    END
    RETURN @Result;
END;
GO
-- example:
SELECT dbo.fnReplacePunctuation('This is, only, a tést-really..', '');

Output:

Thisisonlyatéstreally


If it's a one-off thing, I would use a C# + LINQ snippet in LINQPad to do the job with regular expressions.

It is quick and easy and you don't have to go through the process of setting up a CLR stored procedure and then cleaning up after yourself.


Can't you use PATINDEX to only include NUMBERS and LETTERS instead of trying to guess what punctuation might be in the field? (Not trying to be snarky, if I had the code ready, I'd share it...but this is what I'm looking for).

Seems like you need to create a custom function in order to avoid a giant list of replace functions in your queries - here's a good example:

http://www.codeproject.com/KB/database/SQLPhoneNumbersPart_2.aspx?display=Print

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜