开发者

Using FieldSelector when searching with Lucene

I'm searching articles in PubMed via Lucene. Each of the 20,000,000 articles has an abstract with ~250 words and an ID.

At the moment I store my searches, with each take multiple seconds, in a TopDocs object. Searchs can find thousands of articles. I'm just interested in the ID of the article. Does Lucene load the abstracts internally into the 开发者_如何学运维TopDocs?

If so can I prevent that behavior through FieldSelectors or do FieldSelectors only work with IndexReader and don't work with IndexSearcher?


No, Lucene does not load the values of fields into TopDocs. TopDocs only contains the doc number and score for each one of the matching documents.

If you're having performance issues, here's another SO question that can help you:

Optimizing Lucene performance


Lucene, by default, does not load any stored fields. If you want to retrieve only the ID field, and if you can afford to load up all the IDs in memory, then you can load all values as follows and reuse them.

String[] allIDs = FieldCache.DEFAULT.getStrings(indexReader, "IDFieldName")

Please check the answer for FieldCache. Best way to retrieve certain field of all documents returned by a Lucene search


You're on the right lines.

Try using a SetBasedFieldSelector when you retrieve the document from the index.

As another poster noted, iterating through the hits will return a ScoreDoc object. This will give you the document Id that can be used to retrieve the document using the IndexReader associated with the IndexSearcher.

If IO is a problem because of loading fields you aren't interested in, you should be in for a pleasant surprise.

Hope this helps,

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜