开发者

Increasing the weight of particular terms (e.g. headings) when indexing documents in Lucene

I have documents which I am indexing with Lucene. These documents basically have a title (text) and body (text). Currently I am creating an index out of Lucene Documents with (amongst other fields) a single searchable field, which is basically title+" "+body. In this way, if you search for anything which occurs in the title or in the body, you will find the document.

However, now I have learned of the new requirement that matches in the title should cause the document to be "more relevant" than matches in the body. Thus, if there is a document with the title "Software design", and the user searches for "Software design", then that document should be placed higher up in the search results than a document called something else, which mentions software design a lot in the body.

I don't really have any idea how to begin implementing this requirement. I know that Google e.g. treats certain parts of the document a开发者_开发百科s "more relevant" (e.g. text within <h1> tags), everyone here assumes Lucene supports something similar.

However,

  • The Javadoc for the Document class clearly states that fields contain text, i.e. not structured text where some parts are "more important" than other parts.
  • This blog post states "With Lucene, it is impossible to increase or decrease the weight of individual terms in a document."

I'm not really sure where to look. What would you suggest?

Any specific information (e.g. links to Lucene documentation) stating flatly that such a thing is not possible would also be helpful, then I needn't spend any further time looking for how to do it. (The software is already written with Lucene, so we won't re-write it now, so if Lucene doesn't support it, then there's nothing anyone (my boss) can do about that.)


Just use two fields, title and body, and while indexing boost 'title' field:

title.setBoost(float)

see here


you probably should split the combine field become title and body separately, then use the run-time boost to give more relevancy for title field

the run-time query will be like

title:apache^20 body:apache

see - http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#Boosting%20a%20Term

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜