Faceted Search without duplication of data (no ETL)
All solutions I've seen so far involve duplication of data by using nosql or datawarehousing. Are there more efficient ways?
2011-06-07 EDIT: When I say no duplication I 开发者_高级运维mean no ETL either. I would like to extract data directly from main database. It's relational but I'm in time to change.
There is a patch for Solr that adds field collapsing. It works fairly well except the problems are reported when the returned result set is millions documents long.
Also, it doesn't calculate facet numbers very precisely - sometimes the total number of all the facets doesn't tally with the number of documents in the set. However, the difference always seems to be not that big - I noticed the fluctuations of less than 100 for result set of 10000-50000 documents.
Obviously, to use this patch you'll have to build your own version of Solr. If you're not comfortable with that you can try the already built version I am using. I have uploaded to my SkyDrive both a patched .war file and my "lib" folder (not sure if the latter is necessary and if the patch does any changes to libraries, but just in case they also there). Also I need to mention that this version should be used on your own risk only - they serve me without any serious complaints, but I can't guarantee the same for others. Here's the download link.
Alternatively, you can wait for Solr 4 to be released - it will include field collapsing but it still bore unresolved critical issues last time I checked. By the way, its collapsing search parameters won't be compatible with the patch described above, so you use first one and then another you'll need to amend your code as well.
精彩评论