开发者

Solr facet counts are not correct, how to deduplicate

We are using two solrs to index the files. Sometimes one article is indexed in both solrs because we do update. It cause a problem that the facet counts are not correct due to these duplicated articles. How can I de-duplicate t开发者_如何学编程he counts?


My advise would be not to keep duplicated articles. So you need a method to identify this duplicates articles and deleted it form one SOLR.

If you don't want to delete duplicate articles you still need to keep track of them. Knowing which articles from SOLR1 are duplicates in SOLR2 will help you de-duplicate the counts like this:

  • create an extra field in SOLR1 named :

    IsDuplicateField = true, if article is duplicated in SOLR2
                     = false, otherwise
    
  • when you do the query to SOLR1 add: IsDuplicatedField=true to facets.

  • when retrieving result just decrease the total number of facet counts with total number of IsDuplicateField from SOLR1.

In this situation the facet IsDuplicateField will retrieve all the articles that are duplicated and match your query.

Good luck !

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜