How can to group lucene's results?
My application indexes discussion threads. Each entry in the discussion is indexed as a separate Lucene document with a common_id fi开发者_如何学编程eld which can be used to group search hits into one discussion.
Currently when the search is performed, if a thread has 3 entries, then 3 separate hits are returned. Even though this is correct, from the users point of view the same entry is appearing in the search multiple times.
Is there a way to tell lucene to group it's search results by the common_id field before returning them?
I believe what you are asking for is Field Collapsing, which is a feature of Solr (and I believe Elasticsearch as well).
If you want to roll your own, One possible way to do this is:
- Add a "series id" field to each document that is a member of a series. You will have to ensure that this gets incremented for every new series.
- Make an initial query to Lucene, and get a hit list.
- For each hit, check to see if it has a series id; If it does, make another query by the series id in order to retrieve all the members of the series.
An alternative is to store the ids of all the series members in a field inside each member's document.
There is nothing built into Lucene that collapses results based on a field. You will need to implement that yourself.
However, they've recently built this feature into Solr.
See http://www.lucidimagination.com/blog/2010/09/16/2446/
Since version 3.2 lucene supports grouping search results based on a field. http://lucene.apache.org/core/4_1_0/grouping/org/apache/lucene/search/grouping/package-summary.html
精彩评论