开发者

SQL function to determine the most accurate result

If I have a table like this...

create table #words (
id int identity,
word varchar(1024)
开发者_JS百科)

insert into #words (word) values ('dock')
insert into #words (word) values ('dockable')

and i do a LIKE query

select id, word from #words where word like '%dock%'

Is there a way to tell which result would be the most accurate?


For complex multi-word criteria you should use Full Text Search and CONTAINSTABLE. The output of this table function contains a RANK column:

The table produced by CONTAINSTABLE includes a column named RANK. The RANK column is a value (from 0 through 1000) for each row indicating how well a row matched the selection criteria. This rank value is typically used in one of these ways in the SELECT statement:

  • In the ORDER BY clause to return the highest-ranking rows as the first rows in the table.
  • In the select list to see the rank value assigned to each row.

For simple single word criteria you should implement a Levenshtein distance function in SQL CLR and use that to find the most similar best match words (or use the one from Ken Redler's linked project).


You could try using similarity metrics to get a distance score for each result as compared to the search string. SOUNDEX and the like give you some primitive options, but there are much more sophisticated alternatives, depending on your requirement. The SimMetrics library of functions allows you to compare strings by Hamming distance, Levenshtein distance, etc. Here's a thorough article describing the installation and usage of the library.


You can use the SOUNDEX and DIFFERENCE T-SQL functions to compare words, but you may still need a way to determine which is "most accurate".

For example, run the following queries:

SELECT DIFFERENCE('dock','dock');
SELECT DIFFERENCE('dock','dockable');

Difference of 'dock' and 'dock' is 4, which is the best result; 'dock' and 'docakble' is a 2, which is a higher difference.


I would look at using Full Text Searching (FTS) - CONTAINS is more precise than FREETEXT

CONTAINS

WHERE CONTAINS(word, 'dock') 

FREETEXT

WHERE FREETEXT (word, 'dock') 

Indexed, these will be faster than LIKE and FTS includes a score value based on an algorithm to rank matching. You'll have to test & see if the results fit your needs.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜