SQL function to determine the most accurate result
If I have a table like this...
create table #words (
id int identity,
word varchar(1024)
开发者_JS百科)
insert into #words (word) values ('dock')
insert into #words (word) values ('dockable')
and i do a LIKE query
select id, word from #words where word like '%dock%'
Is there a way to tell which result would be the most accurate?
For complex multi-word criteria you should use Full Text Search and CONTAINSTABLE
. The output of this table function contains a RANK
column:
The table produced by
CONTAINSTABLE
includes a column namedRANK
. TheRANK
column is a value (from 0 through 1000) for each row indicating how well a row matched the selection criteria. This rank value is typically used in one of these ways in theSELECT
statement:
- In the
ORDER BY
clause to return the highest-ranking rows as the first rows in the table.- In the select list to see the rank value assigned to each row.
For simple single word criteria you should implement a Levenshtein distance function in SQL CLR and use that to find the most similar best match words (or use the one from Ken Redler's linked project).
You could try using similarity metrics to get a distance score for each result as compared to the search string. SOUNDEX
and the like give you some primitive options, but there are much more sophisticated alternatives, depending on your requirement. The SimMetrics library of functions allows you to compare strings by Hamming distance, Levenshtein distance, etc. Here's a thorough article describing the installation and usage of the library.
You can use the SOUNDEX and DIFFERENCE T-SQL functions to compare words, but you may still need a way to determine which is "most accurate".
For example, run the following queries:
SELECT DIFFERENCE('dock','dock');
SELECT DIFFERENCE('dock','dockable');
Difference of 'dock' and 'dock' is 4, which is the best result; 'dock' and 'docakble' is a 2, which is a higher difference.
I would look at using Full Text Searching (FTS) - CONTAINS is more precise than FREETEXT
CONTAINS
WHERE CONTAINS(word, 'dock')
FREETEXT
WHERE FREETEXT (word, 'dock')
Indexed, these will be faster than LIKE and FTS includes a score value based on an algorithm to rank matching. You'll have to test & see if the results fit your needs.
精彩评论