Extending / changing how Zend_Search_Lucene searches

2022-12-29 20:30 问答作者：

I am currently using Zend_Search_Lucene to index and search a number of documents currently at around a 1000 or so. What I would like to do is change how the engine scores hits on a document, from the current default.

Zend_Search_Lucene scores on the frequency of number of hits within a document, so a document that has 10 matches of the word PHP will score higher than a document with only 3 matches of PHP. What I am trying to do is pass a number of key words and score depending on the hits of those keywords. e.g.

I pass 5 key words say,PHP, MySQL, Javascript, HTML and CSS that I search against the index. One document has 3 matches to those key words and one document has all 4 matches, the 4 matches scores the h开发者_高级运维ighest. The number of instances of those words in the document do not concern me.

Now I've had a quick look at Zend_Search_Lucene_Search_Similarity however I have to confess that I am not sure (or that bright) to know how to use this to achieve what I am after.

Is what I want to do possible using Lucene or is there a better solution out there?

For what I've understood in the Zend_Search_Lucene_Search_Similarity section of the manual, I'd start by extending the default similarity class to override the tf (term frequency) method so that it doesn't alter the score:

class MySimilarity extends Zend_Search_Lucene_Search_Similarity {    
    public function tf($freq) {
        return 1.0; // overriding default sqrt($freq);
    }
}

This way the number of matches shouldn't be taken into account. Do you think this would be enough?

Then, set it to be the default similarity algorithm before indexing:

Zend_Search_Lucene_Search_Similarity::setDefault(new MySimilarity());

继续阅读：lucene php search zend-framework zend-search-lucene

Extending / changing how Zend_Search_Lucene searches

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？