Removing repeated duplicated characters

2023-03-01 01:20 问答作者：

I have a string in my stored proc like ',,,sam,,bob,' or ',,,' from the above string I have to delete multiple commas from it, it must look like 'sam,bob,' or only if ',,,' then '' . I must use only Sql Server Functions. Im using Sql Server 2008开发者_如何转开发 and .Net 3.5

Thanks in advance.

This works for strings that are exclusively commas or have up to 398 contiguous commas.

 SELECT 
     CASE 
         WHEN TargetString NOT LIKE '%[^,]%' 
             THEN '' /*The string is exclusively commas*/
         ELSE 
            REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(TargetString,
            REPLICATE(',',16),','), /*399/16 = 24 remainder 15*/
            REPLICATE(',',8),','),  /* 39/ 8 =  4 remainder 7*/
            REPLICATE(',',4),','),  /* 11/ 4 =  2 remainder 3*/
            REPLICATE(',',2),','),  /*  5/ 2 =  2 remainder 1*/
            REPLICATE(',',2),',')   /*  3/ 2 =  1 remainder 1*/
         END
 FROM T

Add extra powers of 2 at the top if you need more or remove from the top if you need less. The comments by each stage indicate the smallest number that this stage will not deal with successfully.

All the comment lines are in this format

/*  L/D    =  Q remainder R */

D:    Corresponds to the length of the string generated by `REPLICATE`
R:    Is always D-1
Q+R:  Form L for the next step

So to extend the series upwards with another REPLICATE(',',32),',') stage

D = 32 
R = 31
Q = 368 (399-31)
L = (368 * 32) + 31 = 11807

So that would deal with sections of commas up to 11,806 characters.

I would suggest a UDF to do this. Since the UDF I am about to suggest doesn't touch any tables, the performance should be pretty good.

CREATE Function [dbo].[CleanDuplicates](@Data VarChar(8000), @DuplicateChar VarChar(1))
Returns VarChar(8000)
WITH SCHEMABINDING
AS
Begin

    Set @Data = @DuplicateChar + @Data

    While PATINDEX('%' + @DuplicateChar + @DuplicateChar + '%',@Data) > 0
        Set @Data = REPLACE(@Data, @DuplicateChar + @DuplicateChar,@DuplicateChar)

    Return Right(@Data, Len(@Data)-1)

End

You can test the function like this:

Select dbo.CleanDuplicates(',,,', ',')
Select dbo.CleanDuplicates(',,,sam,,bob,', ',')

try this

SELECT @Parameter AS 'BEFORE'
BEGIN
WHILE CHARINDEX(',,', @Parameter) > 0
    BEGIN
        SELECT @Parameter = REPLACE(@Parameter, ',,',',') 
    END
SELECT @Parameter AS 'AFTER'
END

George Mastros wrote:

I would suggest a UDF to do this. Since the UDF I am about to suggest doesn't touch any tables, the performance should be pretty good.

I agree that "memory only" Scalar UDF's are quite fast. In fact, I actually used one of George's Scalar UDFs, which solved the "Initial Caps" problem, to demonstrate that sometimes "Set Based" code ISN'T always the best way to go.

However, Martin Smith (another poster on this very thread) was definitely on the right track. In this case, "Set Based" is still the way to go. Of course, anyone can make an unsubstantiated claim as to performance so let's heat this up with a performance demonstration.

To demonstrate, we first need some test data. A LOT of test data because both of the functions we're going to test run nasty fast. Here's the code to build a million row test table.

--===== Conditionally drop the test table 
     -- to make reruns in SSMS easier
     IF OBJECT_ID('tempdb..#MyHead','U') IS NOT NULL
        DROP TABLE #MyHead
GO
--===== Create and populate the test table on-the-fly.
     -- This builds a bunch of GUIDs and removes the dashes from them to 
     -- increase the chances of duplicating adjacent characters.
     -- Not to worry.  This takes less than 7 seconds to run because of
     -- the "Pseudo Cursor" created by the CROSS JOIN.
 SELECT TOP 1000000
        RowNum     = IDENTITY(INT,1,1),
        SomeString = REPLACE(CAST(NEWID() AS VARCHAR(36)),'-','')
   INTO #MyHead
   FROM sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;
GO

No need to repost George's fine function here but I do need to post mine. The following function produces the same result as George's does. It looks like an "iTVF" (Inline Table Valued Function) and it is but it only returns one value. That's why Microsoft calls them "Inline Scalar Functions" (I call them "iSFs" for short).

 CREATE FUNCTION dbo.CleanDuplicatesJBM
        (@Data VARCHAR(8000), @DuplicateChar VARCHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN 
 SELECT Item =  STUFF(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
                    @DuplicateChar+@Data COLLATE LATIN1_GENERAL_BIN,
                REPLICATE(@DuplicateChar,33),@DuplicateChar),
                REPLICATE(@DuplicateChar,17),@DuplicateChar),
                REPLICATE(@DuplicateChar, 9),@DuplicateChar),
                REPLICATE(@DuplicateChar, 5),@DuplicateChar),
                REPLICATE(@DuplicateChar, 3),@DuplicateChar),
                REPLICATE(@DuplicateChar, 2),@DuplicateChar),
                REPLICATE(@DuplicateChar, 2),@DuplicateChar)
                ,1,1,'')
;
GO

First, let's test George's Scalar UDF. Please read the comments about why we're not using SET STATISTICS TIME ON here.

/******************************************************************************
 Test George's code.
 Since Scalar Functions don't work well with SET STATISTICS TIME ON, we measure
 duration a different way.  We'll also throw away the result in a "Bit Bucket"
 variable because we're trying to measure the performance of the function 
 rather than how long it takes to display or store results.
******************************************************************************/
--===== Declare some obviously named variables
DECLARE @StartTime DATETIME,
        @BitBucket VARCHAR(36)
;
--===== Start the "Timer"
 SELECT @StartTime = GETDATE()
;
--===== Run the test on the function
 SELECT @BitBucket = [dbo].[CleanDuplicates](SomeString,'A')
   FROM #MyHead
;
--===== Display the duration in milliseconds
  PRINT DATEDIFF(ms,@StartTime,GETDATE())
;
--===== Run the test a total of 5 times
GO 5

Here are the returns from that "fiver" run...

Beginning execution loop
15750
15516
15543
15480
15510
Batch execution completed 5 times.
(Average is 15,559 on my 10 year old, single 1.8Ghz CPU)

Now, we'll run the "iSF" version...

/******************************************************************************
 Test Jeff's code.
 Even though this uses an "iSF" (Inline Scalar Function), we'll test exactly
 the same way that we tested George's code so we're comparing apples-to-apples.
 This includes throwing away the result in a "Bit Bucket" variable because 
 we're trying to measure the performance of the function rather than how long 
 it takes to display or store results.
******************************************************************************/
--===== Declare some obviously named variables
DECLARE @StartTime DATETIME,
        @BitBucket VARCHAR(36)
;
--===== Start the "Timer"
 SELECT @StartTime = GETDATE()
;
--===== Run the test on the function
 SELECT @BitBucket = cleaned.ITEM
   FROM #MyHead
  CROSS APPLY [dbo].[CleanDuplicatesJBM](SomeString,'A') cleaned
;
--===== Display the duration in milliseconds
  PRINT DATEDIFF(ms,@StartTime,GETDATE())
;
--===== Run the test a total of 5 times
GO 5

Here are the results from that run.

Beginning execution loop
6856
6810
7020
7350
6996
Batch execution completed 5 times.
(Average is 7,006 {more than twice as fast} on my 10 year old, single 1.8Ghz CPU)

My point ISN'T that George's code is bad. Not at all. In fact, I use Scalar UDFs when there is no "single query" solution. I'll also state and back George up by saying that not all "single query" solutions are always the best.

Just don't stop looking for them when it comes to UDFs. ;-)

Your solutions are good but

it is comma only
i hate loop-based TSQL code ;-)

so i wrote based on Marcin solution set-based universal code for replacement of every declared kind of duplicates:

DECLARE @Duplicate NVARCHAR(100)= '#$'
DECLARE @TestString NVARCHAR(MAX)= 'test_test__f##f2$$g'
DECLARE @Replacement NVARCHAR(MAX)= ''
DECLARE @OutputString NVARCHAR(MAX)= @teststring ;
WITH    numbers
          AS ( SELECT   ROW_NUMBER() OVER ( ORDER BY o.object_id, o2.object_id ) Number
               FROM     sys.objects o
                        CROSS JOIN sys.objects o2
             ),
        chars
          AS ( SELECT   SUBSTRING(@Duplicate, 1, 1) CHAR ,
                        CAST(1 AS INT) [LEVEL]
               UNION ALL
               SELECT   SUBSTRING(@Duplicate, numbers.Number, 1) CHAR ,
                        CAST(numbers.Number AS INT) [LEVEL]
               FROM     numbers
                        JOIN chars ON chars.Level + 1 = numbers.Number
               WHERE    LEN(SUBSTRING(@Duplicate, numbers.Number, 1)) > 0
             ),
        Replicated
          AS ( SELECT   REPLICATE(CHAR, numbers.number) Repl ,
                        numbers.Number
               FROM     chars
                        CROSS JOIN numbers
             )
    SELECT  @OutputString = REPLACE(@OutputString, Repl, @Replacement)
    FROM    replicated
    WHERE   number <= LEN(@TestString)

SELECT  @OutputString

You can declare every kind of char in Duplicate string and every replacement string in @Replacement. Additional gain IMHO is that i search for replacement only in range of maximum length of input string

You could try

SELECT REPLACE(LTRIM(REPLACE(',,,sam,,bob,', ',', ' ')),' ', ',')

继续阅读：sql sql-server-2008 tsql

Removing repeated duplicated characters

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？