Lucene docID reliability
Hi
If only insert operation occur on lucene index (no delete/update), is it true that docID is not changing ? and its also reliable if it is true, i want to use it as loading FieldCache incrementally to lower dow开发者_开发知识库n the overhead of loading all documents, what is the best solution for that ??I'm not quite sure what you're planning to do with the field cache, but my understanding of document ids is that they can change during an insert, depending on pending deletes, merge policies etc.
i.e. Document ID should not be used past a commit boundary on a reopened index reader
Hope this helps,
The document id is static within a segment. IndexReader.Open
(usually) opens a DirectoryReader
which combines several SegmentReader
. You'll need to pass the "bottom" reader to the FieldCache for the population to work correctly.
Here's an example from FieldCache with frequently updating index which ensures that only the newly read segment is read by the FieldCache, instead of the topmost reader (which will considered changed at every commit).
var directory = FSDirectory.Open(new DirectoryInfo("index"));
var reader = IndexReader.Open(directory, readOnly: true);
var documentId = 1337;
// Grab all subreaders.
var subReaders = new List<IndexReader>();
ReaderUtil.GatherSubReaders(subReaders, reader);
// Loop through all subreaders. While subReaderId is higher than the
// maximum document id in the subreader, go to next.
var subReaderId = documentId;
var subReader = subReaders.First(sub => {
if (sub.MaxDoc() < subReaderId) {
subReaderId -= sub.MaxDoc();
return false;
}
return true;
});
var values = FieldCache_Fields.DEFAULT.GetInts(subReader, "newsdate");
var value = values[subReaderId];
精彩评论