开发者

TSQL Statement to move Name Suffix (Jr, Sr, IV, etc) into another field

I would like to have a TSQL Statement to move Name Suffix (Jr, Sr., IV, etc) into another field.

The suffixes I see are JR SR I II III IV V

Here is a sample of the data

LastName
BRUNNING, II
BURCH II
BUSS, JR.
CANI III
CHRISTIAN,SR
COLVIN Jr
COWHERD,JR.

I would like the suffix moved out of the LastName field into another field called Suffix.

LastName   Suffix  
BRUNNING   II
BURCH      I     
BUSS       JR
CANI       III
CHRISTIAN  SR
COLVIN     JR
COWHERD    JR

I am using SQL Server 2005 and can use SQL#开发者_JAVA百科 functions.

Any help would be greatly appretiated.


You can probably do better than this using the SQL# functions, but in straight T-SQL, here you go.

The main idea here is to parse out the last segment/token in the name using REVERSE and PATINDEX, and then match it to a list of known suffixes.

First some test data:

IF OBJECT_ID('tempdb..#names') IS NOT NULL DROP TABLE #names
IF OBJECT_ID('tempdb..#suffixes') IS NOT NULL DROP TABLE #suffixes
CREATE TABLE #names (name VARCHAR(32))
CREATE TABLE #suffixes (suffix VARCHAR(32))
GO

INSERT #names VALUES ('BRUNNING, II' )
INSERT #names VALUES ('BURCH II'     )
INSERT #names VALUES ('BUSS, JR.'    )
INSERT #names VALUES ('CANI III'     )
INSERT #names VALUES ('CHRISTIAN,SR' )
INSERT #names VALUES ('COLVIN Jr'    )
INSERT #names VALUES ('COWHERD,JR.'  )
INSERT #names VALUES ('BILLY BOB'    )
INSERT #names VALUES ('JOHNNY'       )

INSERT #suffixes VALUES ('II' )
INSERT #suffixes VALUES ('III')
INSERT #suffixes VALUES ('JR' )
INSERT #suffixes VALUES ('SR' )

Then, an inline SELECT version. Notice the use of NULLIF to control for SUBSTRING errors.

SELECT
  name
, left_segments 
, right_segment
, new_name = CASE WHEN b.suffix IS NOT NULL THEN a.left_segments ELSE a.name END
, b.suffix
FROM (
  SELECT 
    name
  , left_segments = CASE WHEN left_segments LIKE '%[ ,]' THEN LEFT(left_segments,LEN(left_segments)-1) ELSE left_segments END
  , right_segment = CASE WHEN right_segment LIKE '%[.]' THEN LEFT(right_segment,LEN(right_segment)-1) ELSE right_segment END
  FROM (
    SELECT * 
    , left_segments = RTRIM(LEFT(RTRIM(name),LEN(name)-NULLIF(PATINDEX('%[ ,]%',REVERSE(RTRIM(name))),0)))
    , right_segment = RIGHT(RTRIM(name),NULLIF(PATINDEX('%[ ,]%',REVERSE(RTRIM(name))),0)-1)
    FROM #names
    ) a
  ) a
LEFT JOIN #suffixes b ON a.right_segment = b.suffix

Alternately, UPDATE w/ local vars:

ALTER TABLE #names ADD 
  left_segments VARCHAR(64)
, right_segment VARCHAR(64)
GO

DECLARE 
  @name VARCHAR(64)
, @len INT
, @last_delim INT
, @left_segments VARCHAR(64)
, @right_segment VARCHAR(64)

UPDATE #names SET 
  @name           = RTRIM(name)
, @len            = LEN(@name)
, @last_delim     = @len-NULLIF(PATINDEX('%[ ,]%',REVERSE(@name)),0)
, @left_segments  = RTRIM(LEFT(@name,@last_delim))
, @right_segment  = RIGHT(@name,@len-@last_delim-1)
, @left_segments  = CASE WHEN @left_segments LIKE '%[ ,]' THEN LEFT(@left_segments,LEN(@left_segments)-1) ELSE @left_segments END
, @right_segment  = CASE WHEN @right_segment LIKE '%[.]'  THEN LEFT(@right_segment,LEN(@right_segment)-1) ELSE @right_segment END
, left_segments   = @left_segments
, right_segment   = @right_segment

SELECT a.*
, new_name = CASE WHEN b.suffix IS NOT NULL THEN a.left_segments ELSE a.name END
, suffix = b.suffix
FROM #names a LEFT JOIN #suffixes b ON a.right_segment = b.suffix

The inline SELECT is fairly convenient, but difficult to read and troubleshoot. I prefer the UPDATE with local vars for anything I might have to return to later. Plus, it makes individual edits easier to apply.

EDIT, SELECT method, slightly edited, and wrapped in an inline table-valued function. A inline TVF should be more efficient than a scalar UDF, and you get multiple return values to boot.

CREATE FUNCTION dbo.ParseNameAndSuffix (@name VARCHAR(64), @ValidSuffixes VARCHAR(512))
RETURNS TABLE AS RETURN (
  SELECT
    left_segments 
  , right_segment
  , new_name = CASE WHEN CHARINDEX(';'+right_segment+';',';'+@ValidSuffixes+';') > 0 THEN a.left_segments ELSE a.name END
  , suffix   = CASE WHEN CHARINDEX(';'+right_segment+';',';'+@ValidSuffixes+';') > 0 THEN a.right_segment END
  FROM (
    SELECT 
      name
    , left_segments = CASE WHEN left_segments LIKE '%[ ,]' THEN LEFT(left_segments,LEN(left_segments)-1) ELSE left_segments END
    , right_segment = CASE WHEN right_segment LIKE '%[.]' THEN LEFT(right_segment,LEN(right_segment)-1) ELSE right_segment END
    FROM (
      SELECT name
      , left_segments = RTRIM(LEFT(name,LEN(name)-NULLIF(PATINDEX('%[ ,]%',REVERSE(name)),0)))
      , right_segment = RIGHT(name,NULLIF(PATINDEX('%[ ,]%',REVERSE(name)),0)-1)
      FROM (SELECT name = LTRIM(RTRIM(@name))) a
      ) a
    ) a
  )
GO

SELECT * FROM #names a
CROSS APPLY dbo.ParseNameAndSuffix(a.name,'II;III;JR;SR') b


Off the top of my head, since you've got a small number of replacements, you could do something like this:

UPDATE [TableName] SET LastName = SUBSTRING(LastName,0, CHARINDEX(lastname,'III')), SUFFIX = 'III' WHERE CHARINDEX(lastname,'III') > 0;


You might want to do some pre-processing to make the formats more consistent.

Consider removing the ending period and replacing all commas with a space. After that, your sample should look like:

LastName
BRUNNING  II
BURCH II
BUSS  JR
CANI III
CHRISTIAN SR
COLVIN Jr
COWHERD JR

Then you can identify rows ending in ' I',' II',' III',' JR', and ' SR' and strip their suffix according to its length and update the Suffix field with the value you want.


If CLR is not an option then what Peter suggested is great. However, since you said that you have SQL# then you can do this in a simpler and more functional manner using the RegEx_MatchSimple function in SQL#. I will explain using Peter's example as the starting point.

We can set up the test using nearly the same SQL that Peter used but in this case I will create the Suffixes table as a real (non-Temp) table so that I can reference it in one of the example TVFs below. Maybe you might want to keep them in a table as opposed to passing in as a parameter, but I will show both styles. I also added two names to the #Names table to show how using RegEx can help capture variations in the data (extra spaces and/or commas):

USE [tempdb]
GO
IF OBJECT_ID('tempdb..#Names') IS NOT NULL DROP TABLE #Names
IF OBJECT_ID('tempdb.dbo.Suffixes') IS NOT NULL DROP TABLE dbo.Suffixes
CREATE TABLE #Names (LastName VARCHAR(32))
CREATE TABLE dbo.Suffixes (Suffix VARCHAR(32))
GO

INSERT #Names VALUES ('BRUNNING, II'  )
INSERT #Names VALUES ('BURCH II'      )
INSERT #Names VALUES ('BUSS, JR.'     )
INSERT #Names VALUES ('CANI III'      )
INSERT #Names VALUES ('CHRISTIAN,SR'  )
INSERT #Names VALUES ('COLVIN Jr'     )
INSERT #Names VALUES ('COWHERD,JR.'   )
INSERT #Names VALUES ('BILLY BOB'     )
INSERT #Names VALUES ('JOHNNY'        )
INSERT #Names VALUES ('BRUNNING, II ' )
INSERT #Names VALUES ('SMITH ,, SR. ' )

INSERT dbo.Suffixes VALUES ('II' )
INSERT dbo.Suffixes VALUES ('III')
INSERT dbo.Suffixes VALUES ('JR' )
INSERT dbo.Suffixes VALUES ('SR' )

The first thing to show is a simple example of it working with the above data. In this case I use a CTE to generate the list of matches against the names and then filter out the rows that did not match anything. I enclosed the [FullMatch] field in colons so it would be easier to see the leading and trailing spaces being captured:

;WITH cte AS (
    SELECT  names.LastName,
            [SQL#].[SQL#].RegEx_MatchSimple(names.LastName, '(([ ]*,+[ ]*)|([ ]+))' + suff.Suffix + '[.]*[ ]*$', 1, 'IgnoreCase') AS [FullMatch],
            suff.suffix
    FROM    #Names names
    CROSS JOIN tempdb.dbo.Suffixes suff
)
SELECT  cte.LastName, ':' + cte.FullMatch + ':' AS [FullMatch], REPLACE(cte.LastName, cte.FullMatch, '') AS [Replacement], cte.Suffix
FROM    cte
WHERE   cte.FullMatch <> ''

You can carry this theory over to a direct UPDATE statement:

;WITH cte AS (
    SELECT  names.LastName,
            [SQL#].[SQL#].RegEx_MatchSimple(names.LastName, '(([ ]*,+[ ]*)|([ ]+))' + suff.Suffix + '[.]*[ ]*$', 1, 'IgnoreCase') AS [FullMatch],
            suff.Suffix
    FROM    MyTable names
    CROSS JOIN NameSuffixes suff
)
UPDATE  mt
SET     mt.LastName = REPLACE(cte.LastName, cte.FullMatch, ''),
        mt.NameSuffix = cte.Suffix
FROM    MyTable mt
INNER JOIN  cte
        ON  cte.LastName = mt.LastName
WHERE   cte.FullMatch <> ''

You had requested this as a function so that is as follows:

CREATE FUNCTION dbo.ParseNameAndSuffix (@Name VARCHAR(64))
RETURNS TABLE AS RETURN
(
    WITH cte AS (
        SELECT  @Name AS [LastName],
                [SQL#].[SQL#].RegEx_MatchSimple(@Name, '(([ ]*,+[ ]*)|([ ]+))' + suff.Suffix + '[.]*[ ]*$', 1, 'IgnoreCase') AS [FullMatch],
                suff.Suffix
        FROM    tempdb.dbo.Suffixes suff
    )
    SELECT  cte.LastName, cte.FullMatch, REPLACE(cte.LastName, cte.FullMatch, '') AS [Replacement], cte.Suffix
    FROM    cte
    WHERE   cte.FullMatch <> ''
)
GO

And can be used like so:

SELECT  *   
FROM    #Names a
CROSS APPLY dbo.ParseNameAndSuffix(a.LastName) b

-- or --

UPDATE  mt
SET     mt.LastName = REPLACE(parse.LastName, parse.Found, ''),
        mt.NameSuffix = parse.Suffix
FROM    MyTable mt
CROSS APPLY dbo.ParseNameAndSuffix(mt.LastName) parse

To more closely match the example given by Peter that passed in the suffixes as a parameter, that can be done as follows using the String_Split function in SQL#:

CREATE FUNCTION dbo.ParseNameAndSuffix2 (@Name VARCHAR(64), @Suffixes VARCHAR(MAX))
RETURNS TABLE AS RETURN
(
    WITH cte AS (
        SELECT  @Name AS [LastName],
                [SQL#].[SQL#].RegEx_MatchSimple(@Name, '(([ ]*,+[ ]*)|([ ]+))' + suff.Val + '[.]*[ ]*$', 1, 'IgnoreCase') AS [FullMatch],
                suff.Val AS [Suffix]
        FROM    [SQL#].[SQL#].String_Split(@Suffixes, ';', 2) suff
    )
    SELECT  cte.LastName, cte.FullMatch, REPLACE(cte.LastName, cte.FullMatch, '') AS [Replacement], cte.Suffix
    FROM    cte
    WHERE   cte.FullMatch <> ''
)
GO

which can then be used as follows:

SELECT * FROM #Names a
CROSS APPLY dbo.ParseNameAndSuffix2(a.LastName, 'II;III;JR;SR') b

-- or --

UPDATE  mt
SET     mt.LastName = REPLACE(parse.LastName, parse.Found, ''),
        mt.NameSuffix = parse.Suffix
FROM    MyTable mt
CROSS APPLY dbo.ParseNameAndSuffix2(mt.LastName, 'II;III;JR;SR') parse


I think your best bet is going to be a RegEx match for the last word (excluding punctuation marks) being in a list (JR, Sr, III, etc)

Check out this blog

http://blogs.msdn.com/b/khen1234/archive/2005/05/11/416392.aspx

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜