开发者

lucene group by

hi have index simple document where you 开发者_开发知识库have 2 fields:

1. profileId as long

2. profileAttribute as long.

i need to know how many profileId's have a certain set of attribute.

for example i index:

doc1: profileId:1 , profileAttribute = 55
doc2: profileId:1 , profileAttribute = 57
doc3: profileId:2 , profileAttribute = 55

and i want to know how many profiles have both attribute 55 and 57 in this example the answer is 1 cuz only profile id 1 have both attributes

thanks in advance for your help


You can search for profileAttribute:(57 OR 55) and then iterate over the results and put their profileId property in a set in order to count the total number of unique profileIds.

But you need to know that Lucene will perform poorly at this compared to, say, a RDBMS. This is because Lucene is an inverted index, meaning it is very good at retrieving the top documents that match a query. It is however not very good at iterating over the stored fields of a large number of documents.

However, if profileId is single-valued and indexed, you can get its values using Lucene's fieldCache which will prevent you from performing costly disk accesses. The drawback is that this fieldCache will use a lot of memory (depending on the size of your index) and take time to load every time you (re-)open your index.

If changing the index format is acceptable, this solution can be improved by making profileIds uniques, your index would have the following format :

doc1: profileId: [1], profileAttribute: [55, 57]
doc2: profileId: [2], profileAttribute: [55]

The difference is that profileIds are unique and profileAttribute is now a multi-valued field. To count the number of profileIds for a given set of profileAttribute, you now only need to query for the list of profileAttribute (as previously) and use a TotalHitCountCollector.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜