Fuzzy matching on string
I have a question related to matching strings in a MSSQL database. Basically, I have a table that contains ICD9 and CPT codes. The issue is that the format that these codes come in is usually incorrect (i.e. too many characters, missing decimal, etc...). I need to be able to lookup the description for each of these codes from a lookup table containing the correct code.
Because of the way these codes are structured I can do some type of "progressive" match to at least find the category of the code.
Lets say the correct code is something like: 306.98
And for this example lets pretend there are no other values between 306 and 307.
I would like to strip the decimal and look for a match, one character at a time, until one is not found. T开发者_Python百科hen select the last matching string.
So 306,3069,3098, 306981, 3069812, etc... would match the string 306.98.
I hope that makes sense to everyone. I am not sure how I would even begin to do this, so any suggestion would be a great help.
One possible solution is to strip down the code to its basic element (306
) and then do a like operator:
WHERE Code LIKE '306%'
Use FLOOR function to strip the decimal part and then use a LIKE operator in the WHERE clause.
Something like:
SELECT <COLUMN-LIST>
FROM <TABLE-NAME>
WHERE <THE-COLUMN> LIKE CAST(FLOOR(306.09) AS VARCHAR) + '%'
Here you have your example.You just need to convert value to nvarchar @string.
DECLARE @string AS NVARCHAR (MAX) = '306.98';
DECLARE @Table TABLE (
TextVal NVARCHAR (MAX));
INSERT INTO @Table ([TextVal])
SELECT '4444656'
UNION ALL
SELECT '30'
UNION ALL
SELECT '3069'
UNION ALL
SELECT '306989878787'
;
WITH numbers
AS (SELECT ROW_NUMBER() OVER ( ORDER BY (SELECT 1)) AS Number
FROM [sys].[objects] AS o1 CROSS JOIN [sys].[objects] AS o2),
Chars
AS (SELECT SUBSTRING(@string, [Number], 1) AS Let,
[Number]
FROM [numbers]
WHERE [Number] <= LEN(@string)),
Joined
AS (SELECT [Let],
CAST (1 AS BIGINT) AS Number
FROM chars
WHERE [Number] = 1
UNION ALL
SELECT [J].[Let] + CASE
WHEN [Chars].[Let] = '.' THEN '' ELSE [Chars].[Let]
END AS LEt,
Chars.[Number]
FROM [Joined] AS J
INNER JOIN
[Chars]
ON [Chars].[Number] = [J].[Number] + 1)
SELECT *
FROM @Table AS T
WHERE [T].[TextVal] IN (SELECT [Let]
FROM [Joined])
OR [T].[TextVal] LIKE '%'+(SELECT TOP 1 [Let] FROM
[Joined] ORDER BY [Number] DESC ) +'%'
;
Result will be:
TextVal
30
3069
306989878787
I was able to figure it out. Basically, I just needed to step through each character of the string and look for a match until once was no longer found. Thanks for the help!
/* ICD9 Lookup */
USE TSiData_Suite_LWHS_V11
DECLARE @String NVARCHAR (10)
DECLARE @Match NVARCHAR(10)
DECLARE @Substring NVARCHAR (10)
DECLARE @Description NVARCHAR(MAX)
DECLARE @Length INT
DECLARE @Count INT
SET @String = '309.99999999'
/* Remove decimal place from string */
SET @String = REPLACE(@String,'.','')
/* Get lenth of string */
SET @Length = LEN(@String)
/* Initialize count */
SET @Count = 1
/* Get Substring */
SET @Substring = SUBSTRING(@String,1,@Count)
/* Start processing */
IF (@Length < 1 OR @String IS NULL)
/* Validate @String */
BEGIN
SET @Description = 'No match found for string. String is not proper length.'
END
ELSE IF ((SELECT COUNT(*) FROM LookupDiseases WHERE REPLACE(LookupCodeDesc,'.','') LIKE @Substring + '%') < 1)
/* Check for at least one match */
BEGIN
SET @Description = 'No match found for string.'
END
ELSE
/* Look for matching code */
BEGIN
WHILE ((SELECT COUNT(*) FROM ICD9Lookup WHERE REPLACE(LookupCodeDesc,'.','') LIKE @Substring + '%') <> 1 AND (@Count < @Length + 1))
BEGIN
/* Update substring value */
SET @Substring = SUBSTRING(@String,1,@Count + 1)
/* Increment @Count */
SET @Count += 1
/* Select the first matching code and get description */
SELECT TOP(1) @Match = LookupCodeDesc, @Description = LookupName FROM ICD9Lookup WHERE REPLACE(LookupCodeDesc,'.','') LIKE @Substring + '%' ORDER BY LookupCodeDesc ASC
END
END
PRINT @Match
PRINT @Description
精彩评论