开发者

Apache Lucene: Is Relevance Score Always Between 0 and 1?

Greetings,

I have the following Apache Lucene snippet that's giving me some nice results:

int numHits=100;
        int resultsPerPage=100;
        IndexSearcher searcher=new IndexSearcher(reader);
        TopScoreDocCollector collector=TopScoreDocCollector.create(numHits,true);
        Query q=parser.parse(queryString);
        searcher.search(q,collector);
        ScoreDoc[] hits=collector.topDocs(0*resultsPerPage,resultsPerPage).scoreDocs;

        Results r=new Results();
        r.length=hits.length;
        for(int i=0;i<hits.length;i++){
            Document doc=searcher.doc(hits[i].doc);
            double distanceKm=getGreatCircleDistance(lucene2double(doc.get("lat")), lucene2double(doc.get("lng")), Double.parseDouble(userLat), Double.parseDouble(userLng));
            double newRelevance=((1/distanceKm)*Math.log(hits[i].score)/Math.log(2))*(0-1);
            System.out.println(hits[i].doc+"\t"+hits[i].score+"\t"+doc.get("content")+"\t"+"Km="+distanceKm+"\trlvnc="+String.valueOf(newRelevance));
        } 
开发者_如何学运维

What I want to know, is hits[i].score always between 0 and 1? It seems that way, but I can't be sure. I've even checked the Lucene documentation (class ScoreDocs) to no avail. You'll see I'm calculating the log of the "newRelevance" value, which is based on hits[i].score. I need hits[i].score to be between 0 and 1, because if it is below zero, I'll get an error; above 1 and the sign will change from negative to positive.

I hope some Lucene expert out there can offer me some insight.

Many thanks,


Yes, the score will always be between 0 and 1.

When Lucene calculates the score, it finds individual scores for term hits within fields, etc... and totals them. If the highest ranked hit has a total greater than 1, all of the document scores are normalised to be between 0 and 1, with the highest ranked document having a score of 1. If however no document's total was greater than 1, no normalisation occurs and the scores are returned as-is. This is why sometimes the top document has a score of 1 and other times has a score lower than 1.


EDIT: Having done a bit more research, the answer is most likely no. In the version of Lucene I am familiar with (v2.3.2), searches pass through the Hits object, whose GetMoreDocs() method normalises scores if any of them are greater than 1. In later versions, it appears to be that this is not the case as the Hits class is no longer used. Whether your scores will be between 0 and 1 will depend on which version of Lucene you are using, and which mechanism is being used to search.

To quote from the Lucene mailing list:

The score is an arbitrary number > 0. It's not normalized to anything, it should only be used to e.g. sort the results


I believe that Lucene scores are always normalised, i.e. the top-scoring hits get 1 (or near to it). The values should then always be between 0 and 1. By extension, this means that the scores have no objective meaning, i.e. they cannot be compared with anything other than other hits from the same result set.

Disclaimer: I am not a Lucene Scientist. This is based only on my observations of Lucene in action, though, I've never seen this actually documented, so I may have got completely the wrong end of the stick.


The scores are between 1 and 0, but the top score does not have to be 1. Scores are always relative to one another, and a direct comparison should not really be made between scores of two different queries.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜