开发者

removing Duplicate documents from zend lucene indexes

Actually my way of creating and optimizing indexes is that i create and optimiz开发者_如何学运维e a chunk of records each time and not convert all in one go. Now The problem that i am facing is that i get duplicate docs/records created in the index. I need to know is there any function or code for removing duplicates from the index. thanks in advance.


you need to remove a record before you update it, it's the way Lucene work. You can not update upon existing record.

this is how you delete a record

$index = Zend_Search_Lucene::open('data/index');//'data/index' is the file that lucene generated
$query = new Zend_Search_Lucene_Search_Query_Term(new
Zend_Search_Lucene_Index_Term($listing_id, 'listing_id'));// 'listing_id' is a field i added when creating index for the first time. $listing_id is the id value of the row i want to delete
$hits = $index->find($query); 
foreach ($hits as $hit) {
    $index->delete($hit->id);// $hit->id is not listing_id, it's lucene unique index of the row that has listing_id = $listing_id
}

Now you can do an update, which is basically an insert :), that's the way lucene work.


You should have a term which is a unique identifier. Then, before you add a document to the index, you delete it.

Duplicates are just instances in which you have multiple documents with the same unique id. So you would just enumerate all the terms in your unique id field, and search for ones which have two results. There is no built-in method to do this as far as I know.


Do NOT forget commit $index->commit() before you add any new data. That was the reason my duplicates data return in $index->find($query).

$index = Zend_Search_Lucene::open('/lucene/index');
$query = new Zend_Search_Lucene_Search_Query_Term (new Zend_Search_Lucene_Index_Term($id, 'key'));

$hits = $index->find($query);
foreach ($hits as $hit) {
       $index->delete($hit->id); // $hit->id is not key , it's lucene unique index of the row that has key = $id
}
$index->commit();   // apply changes (delete) before index new data

doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::keyword('key', $id));
$doc->addField(Zend_Search_Lucene_Field::Text('user', $user, 'utf-8'));
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜