Meaning of Fuzzy Parameter in Lucene
As stated in the Lucene documentation there is a parameter that enables for specifying similarity required for a match. The value of is between 0 and 1, with a value closer to 1 only terms with a higher similarity will be matched. For example: roam~0.8
Know I wonder whether this parameter is meant in a relative sence, i.e. for a string that is longer, the string edit distance might be higher and there is still a match. O开发者_运维问答r, is this a meant as an absolute value, i.e. only up to x substitutions/deletion/insertions are allowed to make a match happen?
A search for term~sim
will find all terms which have an edit distance of less than length(term) * (1- sim)
. So roam~0.8
will find all terms with an edit distance of less than 4*(1-.8)=.8 of roam.
EDIT:
The term must be longer than 1/(1 - sim). So a search for roam~.8
won't do anything fuzzy, because things with a similarity of .8 must have a length of at least 5.
精彩评论