"boosting" different instances of the same field in a lucene document
I want to use a single field to index the document's title and body, in an effort to improve performance.
The idea was to do something like this:
Field title = new Field("text", "alpha bravo charlie", Field.Store.NO, Field.Index.ANALYZED);
title.setBoost(3)
Field body = new Field("text", "delta echo foxtrot", Field.Store.NO, Field.Index.ANALYZED);
Document doc = new Document();
doc.add(title);
doc.add(body);
And then I could just do a single TermQuery
instead of a BooleanQuery
for two separate fields.
However, it turns out that a field boost is the multiple of all the boost of fields of the same name in the document. In my case, it means that both fields have a boost of 3.
Is there a way I can get what I want without resorting to using two different fields? One way would be to add the title
field several times to the document, which increases the term frequency. This works, but seems incr开发者_开发百科edibly brain-dead.
I also know about payloads, but that seems like an overkill for what I'm after.
Any ideas?
If you want to take a page out of Google's book (at least their old book), then you may want to create separate indexes: one for document bodies, another for titles. I'm assuming there is a field stored that points to a true UID for each actual document.
The alternative answer is to write custom implementations of [Similarity][1] to get the behavior you want. Unfortunately I find that Lucene often needs this customization unique problems arise.
[1]: http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#lengthNorm(java.lang.String, int)
You can index title and body separately with title field boosted by a desired value. Then, you can use MultiFieldQueryParser to search multiple fields.
While, technically, searching multiple fields takes longer time, typically even with this overhead, Lucene tends to be extremely fast (of the order of few tens or hundreds of milliseconds.)
精彩评论