how do I normalise a solr/lucene score?
I am trying to work out how to improve the scoring of solr search results. My application needs to take the score from the solr results and display a number of “stars” depending on how good the result(s) are to the query. 5 Stars = almost/exact down to 0 stars meaning not matching the search very well, e.g. only one element hits. However I am getting scores from 1.4 to 0.8660254 both are returning results that I would give 5 stars to. What I need to do is somehow turn these results in to a percentage so that I can mark these results, with the correct number of stars.
The query that I run that gives me the 1.4 score is:
euallowed:true AN开发者_如何学运维D(grade:"2:1")
The query that gives me the 0.8660254 score is:
euallowed:true AND(grade:"2:1" OR grade:"1st")
I've already updated the Similarity so that the tf and idf return 1.0 as I am only interested if a document has a term, not the number of that term in the document. This is what my similarity code looks like:
import org.apache.lucene.search.Similarity;
public class StudentSearchSimilarity extends Similarity {
@Override
public float lengthNorm(String fieldName, int numTerms) {
return (float) (1.0 / Math.sqrt(numTerms));
}
@Override
public float queryNorm(float sumOfSquaredWeights) {
return (float) (1.0 / Math.sqrt(sumOfSquaredWeights));
}
@Override
public float sloppyFreq(int distance) {
return 1.0f / (distance + 1);
}
@Override
public float tf(float freq) {
return (float) 1.0;
}
@Override
public float idf(int docFreq, int numDocs) {
//return (float) (Math.log(numDocs / (double) (docFreq + 1)) + 1.0);
return (float)1.0;
}
@Override
public float coord(int overlap, int maxOverlap) {
return overlap / (float) maxOverlap;
}
}
So I suppose my questions are:
How is the best way of normalising the score so that I can work out how many “stars” to give?
Is there another way of scoring the results?
Thanks
Grant
To quote http://wiki.apache.org/lucene-java/ScoresAsPercentages:
People frequently want to compute a "Percentage" from Lucene scores to determine what is a "100% perfect" match vs a "50%" match. This is also somethings called a "normalized score"
Don't do this.
Seriously. Stop trying to think about your problem this way, it's not going to end well.
That page does give an example of how you could in theory do this, but it's very hard.
It's called normalized score (Scores As Percentages).
You can use the following the following parameters to achieve that:
ns = {!func}product(scale(product(query({!type=edismax v=$q}),1),0,1),100)
fq = {!frange l=20}$ns
Where 20 is your 20% threshold.
See also:
Remove results below a certain score threshold in Solr/Lucene?
http://article.gmane.org/gmane.comp.jakarta.lucene.user/12076 http://article.gmane.org/gmane.comp.jakarta.lucene.user/10810
I've never had to do anything this complicated in Solr, so there may be a way to hook this in as a plugin - but you could handle it in the client when a result set is returned. If you've sorted by relevance this should be staightforward - get the relevence of the first result (max), and the last (min). Then for each result with relevance x, you can calculate
normalisedValue = (x - min) / (max - min)
which will give you a value between 0 and 1. Multiply by 5 and round to get the number of stars.
精彩评论