Lucene.NET update not removing search terms
I'm using Lucene.NET for a project, and creating an index/searching the ind开发者_Go百科ex is going great. However, when I update, I seem to only add to the search index, but never remove terms from the index. Rebuilding the index from scratch fixes things, but obviously I'd prefer not to do that every time somebody modifies a value.
- As an example, say we have indexed DocumentA with a field FieldB with the text "This is some text to index."
- Searching for the word "fantastic" yields no results.
- Now, we update FieldB of DocumentA to "This is some fantastic text to index."
- Searching for the word "fantastic" yields DocumentA as a result (as expected).
- Update FieldB of DocumentA to "This is some mediocre text to index."
- Searching for "mediocre" yields DocumentA as a result (as expected).
- Searching for "fantastic" still yields DocumentA as a result. This is not the behavior I expect or want.
Here is the method I'm using to update the document (class names changed to protect the innocent):
internal static void ModifyDocuments(IEnumerable<SomeFancyObject> changed)
{
if (changed.Count() == 0) {
return;
}
var dir = FSDirectory.Open(LuceneGlobals.directory);
var writer = new IndexWriter(dir, LuceneGlobals.analyzer, false, new IndexWriter.MaxFieldLength(int.MaxValue));
foreach (var fancyObj in changed) {
//writer.DeleteDocuments(new Term("fancyID", fancyObj.ID.ToString()));
//writer.AddDocument(CreateDocument(fancyObj));
writer.UpdateDocument(new Term("fancyID", fancyObj.ID.ToString()), CreateDocument(index));
}
writer.Optimize();
writer.Close();
}
Note that I have tried the code as written, and also the commented out Delete/Add in place of the Update call. I also tried writer.Commit();
in place of writer.Optimize();
.
Debugging reveals that the entire method is executed, and CreateDocument()
creates a new document with the data I am expecting to see. Here's the CreateDocument()
for completeness:
private static Document CreateDocument(SomeFancyObject fancyObj)
{
var doc = new Document();
doc.Add(new Field("docType", "SomeFancyObject", Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("fancyID", Convert.ToString(fancyObj.ID), Field.Store.YES, Field.Index.NO));
doc.Add(new Field("fancyText", new StringReader(fancyObj.Text)));
doc.Add(new Field("fancyTitle", new StringReader(fancyObj.Title)));
return doc;
}
I'm seeing what I expect to see in fancyObj.Text
and fancyObj.Title
. Perhaps I'm not using all the options correctly here?
What needs to be done to keep my index from remembering data that has been updated away?
You need to index (Field.Index.NOT_ANALYZED
) fancyID. IndexWriter.UpdateDocument
removes all with a matching term, but no terms are generated unless you index it.
You could also look into reading the value from FieldCache, instead of storing it.
精彩评论