开发者

Iterate through all undeleted Documents in a Lucene (.Net) index

I want to get the count of all un-deleted documents of a Lucene (.Net 2.4) index and then read my stored fields of all or a range of these docs. After reading the Lucene help I'm not quite sure, whether IndexReader.NumDocs() returns the count of all docs or only the undeleted ones. Can I simply iterate through IndexReader.Document[] and or does it contain deleted Documents?

If NumDocs() and Docmuent[] does contain both deleted und undeleted docs I suppose I'll have to do something like this:

int totalCount = reader.NumDocs();
int totalCountUndeleted = totalCount;
for (int iDoc = 0; iDoc < totalCount; iDoc++)
  if (reader.IsDeleted(iDoc))
    totalCountUndeleted--;

for (int iDoc = 0; iDoc < totalCount; iDoc++)
{
  if (!reader.IsDeleted(iDoc))
  {
     Document doc = reader.Document(iDoc);
     // read fields
  }
}

I开发者_JAVA技巧s this the right way or is there any other possible way? Thanks


IndexReader.NumDocs will give you number of active documents. IndexReader.MaxDoc is the number one greater than maximum document number in the index. Following code will read all the active documents in the index.

int max = reader.MaxDoc();
for (int iDoc = 0; iDoc < max; iDoc++)
{
  if (!reader.IsDeleted(iDoc))
  {
     Document doc = reader.Document(iDoc);
     // read fields
  }
}


This is the correct way. Until you optimize your index, the documents will not be removed.

Alternatively, if you have a query like *:* which matches all document, you can run that instead. The query method will probably be somewhat slower, but maybe more standard.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜