开发者

Meaning of Fuzzy Parameter in Lucene

As stated in the Lucene documentation there is a parameter that enables for specifying similarity required for a match. The value of is between 0 and 1, with a value closer to 1 only terms with a higher similarity will be matched. For example: roam~0.8

Know I wonder whether this parameter is meant in a relative sence, i.e. for a string that is longer, the string edit distance might be higher and there is still a match. O开发者_运维问答r, is this a meant as an absolute value, i.e. only up to x substitutions/deletion/insertions are allowed to make a match happen?


A search for term~sim will find all terms which have an edit distance of less than length(term) * (1- sim). So roam~0.8 will find all terms with an edit distance of less than 4*(1-.8)=.8 of roam.

EDIT:

The term must be longer than 1/(1 - sim). So a search for roam~.8 won't do anything fuzzy, because things with a similarity of .8 must have a length of at least 5.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜