开发者

In Lucene, can I search one index but use IDF from another one?

I'm building a system where I want to show only res开发者_运维技巧ults indexed in the past few days. Furthermore, I don't want to maintain a giant index with a million documents if I only want to return results from a couple of days (thousands of documents).

On the other hand, my system heavily relies that the occurrences of terms in documents stored in the index have a realistic distribution (consequently: realistic IDF).

That said, I would like to use a small index to return results, but I want to compute documents score using a IDF from a much greater Index (or even an external source).

The Similarity API doesn't seem to allow me to do this. The idf method does not receive as parameter the term being used.

Another possibility is to use TrieRangeQuery to make sure the documents shown are within the last couple of days. Again, I rather not mantain a larger index. Also this kind of query is not cheap.


You should be able to extend IndexReader and override the docFreq() methods to provide whatever values you'd like. One thing this implementation can do is open two IndexReader instances -- one for the small index and one for the large index. All the methods are delegated to the small IndexReader, except for docFreq(), which is delegated to the large index. You'll need to scale the value returned, i.e.

int myNewDocFreq = bigIndexReader.docFreq(t) / bigIndexReader.maxDoc() * smallIndexReader.maxDoc()
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜