开发者

Weighing the tokens generated from Lucene

I need 开发者_如何转开发a suitable weighing algo to return the most relevant tokens for a query ...i hv generated the tokens using Lucene 3.0 ..i m thinking of using the tf-idf concept?can someone suggest a better algo or a modified tf-idf ?


Lucene already implements a TF-IDF variant for weighting. See: http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/Similarity.html

However the weighting is not State-of-the-Art anymore and lacks some performance on term bursts. There are attempts to introduce pluggable algorithms in solr 4.0 as far as i am uptodate. For some versions there are patches for bm25 or some of the newer algorithms available.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜