开发者

Fuzzy matching on string

I have a question related to matching strings in a MSSQL database. Basically, I have a table that contains ICD9 and CPT codes. The issue is that the format that these codes come in is usually incorrect (i.e. too many characters, missing decimal, etc...). I need to be able to lookup the description for each of these codes from a lookup table containing the correct code.

Because of the way these codes are structured I can do some type of "progressive" match to at least find the category of the code.

Lets say the correct code is something like: 306.98

And for this example lets pretend there are no other values between 306 and 307.

I would like to strip the decimal and look for a match, one character at a time, until one is not found. T开发者_Python百科hen select the last matching string.

So 306,3069,3098, 306981, 3069812, etc... would match the string 306.98.

I hope that makes sense to everyone. I am not sure how I would even begin to do this, so any suggestion would be a great help.


One possible solution is to strip down the code to its basic element (306) and then do a like operator:

WHERE Code LIKE '306%'


Use FLOOR function to strip the decimal part and then use a LIKE operator in the WHERE clause.

Something like:

SELECT <COLUMN-LIST>
  FROM <TABLE-NAME>
 WHERE <THE-COLUMN> LIKE CAST(FLOOR(306.09) AS VARCHAR) + '%'


Here you have your example.You just need to convert value to nvarchar @string.

DECLARE @string AS NVARCHAR (MAX) = '306.98';
DECLARE @Table TABLE (
    TextVal NVARCHAR (MAX));

INSERT INTO @Table ([TextVal])
SELECT '4444656'
UNION ALL
SELECT '30'
UNION ALL
SELECT '3069'
UNION ALL
SELECT '306989878787'
;

WITH   numbers
AS     (SELECT ROW_NUMBER() OVER ( ORDER BY (SELECT 1)) AS Number
        FROM   [sys].[objects] AS o1 CROSS JOIN [sys].[objects] AS o2),
       Chars
AS     (SELECT SUBSTRING(@string, [Number], 1) AS Let,
               [Number]
        FROM   [numbers]
        WHERE  [Number] <= LEN(@string)),
       Joined
AS     (SELECT [Let],
               CAST (1 AS BIGINT) AS Number
        FROM   chars
        WHERE  [Number] = 1
        UNION ALL
        SELECT [J].[Let] + CASE 
                           WHEN [Chars].[Let] = '.' THEN '' ELSE [Chars].[Let] 
                           END AS LEt,
               Chars.[Number]
        FROM   [Joined] AS J
               INNER JOIN
               [Chars]
               ON [Chars].[Number] = [J].[Number] + 1)
SELECT *
FROM   @Table AS T
WHERE  [T].[TextVal] IN (SELECT [Let]
                         FROM   [Joined])
          OR [T].[TextVal] LIKE '%'+(SELECT TOP 1 [Let] FROM
          [Joined] ORDER BY [Number] DESC )  +'%'            
                         ;

Result will be:

 TextVal
30
3069
306989878787


I was able to figure it out. Basically, I just needed to step through each character of the string and look for a match until once was no longer found. Thanks for the help!

/* ICD9 Lookup */

USE TSiData_Suite_LWHS_V11

DECLARE @String NVARCHAR (10)
DECLARE @Match NVARCHAR(10)
DECLARE @Substring NVARCHAR (10)
DECLARE @Description NVARCHAR(MAX) 
DECLARE @Length INT
DECLARE @Count INT

SET @String = '309.99999999'

/* Remove decimal place from string */
SET @String = REPLACE(@String,'.','')

/* Get lenth of string */
SET @Length = LEN(@String)

/* Initialize count */
SET @Count = 1

/* Get Substring */
SET @Substring = SUBSTRING(@String,1,@Count)

/* Start processing */
IF (@Length < 1 OR @String IS NULL)
    /* Validate @String */
    BEGIN

        SET @Description = 'No match found for string. String is not proper length.'

    END
ELSE IF ((SELECT COUNT(*) FROM LookupDiseases WHERE REPLACE(LookupCodeDesc,'.','') LIKE @Substring + '%') < 1)
    /* Check for at least one match */
    BEGIN

        SET @Description = 'No match found for string.'

    END
ELSE
    /* Look for matching code */
    BEGIN

        WHILE ((SELECT COUNT(*) FROM ICD9Lookup WHERE REPLACE(LookupCodeDesc,'.','') LIKE @Substring + '%') <> 1 AND (@Count < @Length + 1))
        BEGIN

            /* Update substring value */
            SET @Substring = SUBSTRING(@String,1,@Count + 1)

            /* Increment @Count */
            SET @Count += 1

            /* Select the first matching code and get description */
            SELECT TOP(1) @Match =  LookupCodeDesc, @Description = LookupName FROM ICD9Lookup WHERE REPLACE(LookupCodeDesc,'.','') LIKE @Substring + '%' ORDER BY LookupCodeDesc ASC

        END
    END

PRINT @Match
PRINT @Description
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜