开发者

Find "Similar" Part Numbers

I hope you'll find this interesting! C# is the preferred language for solutions, or T-SQL.

Consider the following items:

Item 1: NJ2-12GM50-Wö-V13

Item 2: NJ2-12GM50-Wo-V13

You can guess that the individual entering Item 1 copied and pasted, and the individual entering Item 2 just used 'o', the closest english character he could find.

When somebody enters a new part number like NJ2-12GM50-Wo-V13, we want to suggest to them that they might mean NJ2-12GM开发者_高级运维50-Wö-V13.

In order to do so, we want to convert all characters that can be stored as a VARCHAR back to their simplist lower case letter. I'm interested in a solution that converts, for example, ASCII character 246 (ö) to ASCII character 111 (o).

The idea being that if we make a lookup column containing all the converted versions of the partnumbers, we can easily match them to keyboard entry.

Of course, I can make a Dictionary to do it, but I wonder if there is a smarter way.

Thanks!


Probably your best bet is to implement a function in either C# or T-SQL to calculate the Levenshtein distance between the two part numbers.


Use the COLLATE clause to coerce the strings to both case and accent insensitive

IF 'NJ2-12GM50-Wö-V13' COLLATE Latin1_General_CI_AI
       =
   'NJ2-12GM50-Wo-V13' COLLATE Latin1_General_CI_AI
      PRINT 'matches'
ELSE
      PRINT 'no match

So, you can use it something like to validate user input

SELECT
   PartNo AS DidYouMeanThis,
   @Input AS WhenYouEnteredThis
FROM
   MyPartNumbers
WHERE
   PartNo COLLATE Latin1_General_CI_AI = @Input COLLATE Latin1_General_CI_AI

using a different COLLATE you can then ensure an exact match on write...

IF NOT EXISTS (SLECT * FROM MyPartNumbers
            WHERE PartNo COLLATE Latin1_General_BIN = @Input Latin1_General_BIN)
   RAISERROR ('Oi! I asked DidYouMeanThis', 16, 1)
ELSE
   INSERT ...


I can't add this as a comment for some reason, but you may consider, instead (or as well as), a matching of "commonly searched for" mappings. Such a system would probably be useful in general (i.e. "red hanger -> AB-999X", etc) and may solve this problem for you. Worth considering, anyway.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜