lucene group by
hi have index simple document where you 开发者_开发知识库have 2 fields:
1. profileId as long
2. profileAttribute as long.
i need to know how many profileId's have a certain set of attribute.
for example i index:
doc1: profileId:1 , profileAttribute = 55
doc2: profileId:1 , profileAttribute = 57
doc3: profileId:2 , profileAttribute = 55
and i want to know how many profiles have both attribute 55 and 57 in this example the answer is 1 cuz only profile id 1 have both attributes
thanks in advance for your help
You can search for profileAttribute:(57 OR 55)
and then iterate over the results and put their profileId
property in a set in order to count the total number of unique profileId
s.
But you need to know that Lucene will perform poorly at this compared to, say, a RDBMS. This is because Lucene is an inverted index, meaning it is very good at retrieving the top documents that match a query. It is however not very good at iterating over the stored fields of a large number of documents.
However, if profileId
is single-valued and indexed, you can get its values using Lucene's fieldCache which will prevent you from performing costly disk accesses. The drawback is that this fieldCache will use a lot of memory (depending on the size of your index) and take time to load every time you (re-)open your index.
If changing the index format is acceptable, this solution can be improved by making profileId
s uniques, your index would have the following format :
doc1: profileId: [1], profileAttribute: [55, 57]
doc2: profileId: [2], profileAttribute: [55]
The difference is that profileId
s are unique and profileAttribute
is now a multi-valued field. To count the number of profileId
s for a given set of profileAttribute
, you now only need to query for the list of profileAttribute
(as previously) and use a TotalHitCountCollector.
精彩评论