开发者

How to get reliable docid from Lucene 3.0.3?

I would like to get the int docid of a Document I just added to a Lucene index so that I can stick it into a Filter to update a standing query. My documents have a unique external id, so I thought that doing a TermDocs enumeration on the unique id would return the correct document, like this:

protected int getDocId(I开发者_开发百科ndexReader reader, String idField, Document doc) throws IOException {
    String id = doc.get(idField);
    TermDocs termDocs = reader.termDocs(new Term(idField, id));
    int docid = -1;
    while (termDocs.next()) {
        docid = termDocs.doc();
        Document aDoc = reader.document(docid);
        String docIdString = aDoc.get(idField);
        System.out.println(docIdString + ": " + docid);
    }
    return docid;
}

Unfortunately, this loops and loops, returning the same docIdString and increasing docids.

What is the recommended way to get the docids for newly-added documents so that I could use them in a Filter immediately after the documents are commited?


The doc Id of a document is not the same as the value in your id field. The doc ID is an internal Lucene identifier, which you probably shouldn't access. Your field is just a field - you can call it "ID", but Lucene won't do anything special with it.

Why are you trying to manually update the filter? When you commit, merges can happen etc. so the IDs before will not be the same as the IDs afterwards. (Which is just an example of the general point that you shouldn't rely on Lucene IDs for anything.) So you don't need to just add that one document to the filter, you need to update the whole thing.

To update a cached filter, just run a query for "foo" and use your filter with a CachingWrapperFilter.


EDIT: Because your id field is just a field, you do a search for it like you would anything else:

TopDocs results = searcher.Search(new TermQuery(new Term("MyIDField", Id)), 1);
int internalId = results.scoreDocs[0].doc;

However, like I said, I think you want to ignore internal IDs. So I would build a filter from a query:

BooleanQuery filterQuery = new BooleanQuery(); // or get existing query from cache
filterQuery.Add(new TermQuery(new Term("MyIdField", Id)), BooleanClause.Occur.SHOULD);
// add more sub queries for each ID you want in the filter here
Filter myFilter = new CachingWrapperFilter(new QueryWrapperFilter(filterQuery));
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜