"boosting" different instances of the same field in a lucene document

2023-01-20 06:55 问答作者：

I want to use a single field to index the document's title and body, in an effort to improve performance.

The idea was to do something like this:

Field title = new Field("text", "alpha bravo charlie", Field.Store.NO, Field.Index.ANALYZED);
title.setBoost(3)
Field body = new Field("text", "delta echo foxtrot", Field.Store.NO, Field.Index.ANALYZED);
Document doc = new Document();
doc.add(title);
doc.add(body);

And then I could just do a single TermQuery instead of a BooleanQuery for two separate fields.

However, it turns out that a field boost is the multiple of all the boost of fields of the same name in the document. In my case, it means that both fields have a boost of 3.

Is there a way I can get what I want without resorting to using two different fields? One way would be to add the title field several times to the document, which increases the term frequency. This works, but seems incr开发者_开发百科edibly brain-dead.

I also know about payloads, but that seems like an overkill for what I'm after.

Any ideas?

If you want to take a page out of Google's book (at least their old book), then you may want to create separate indexes: one for document bodies, another for titles. I'm assuming there is a field stored that points to a true UID for each actual document.

The alternative answer is to write custom implementations of [Similarity][1] to get the behavior you want. Unfortunately I find that Lucene often needs this customization unique problems arise.

[1]: http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#lengthNorm(java.lang.String, int)

You can index title and body separately with title field boosted by a desired value. Then, you can use MultiFieldQueryParser to search multiple fields.

While, technically, searching multiple fields takes longer time, typically even with this overhead, Lucene tends to be extremely fast (of the order of few tens or hundreds of milliseconds.)

继续阅读：lucene performance

"boosting" different instances of the same field in a lucene document

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？