distinct SOLR field values without count
My question is pretty similar to this question
The difference, I'd nee开发者_JAVA百科d the least RAM intensive way to gather information about the distinct values. I DON'T care for the actual count in this case, I just want to know the possible values for that field. I'm constantly running out of heap space (30 million+ documents) and there has to be some way/parameter to do this in a memory saving wayIf the number of distinct values is high, you will probably need to do facet paging. Use the facet.offset and facet.limit parameters.
Use the StatsComponenet to retrieve a list of distinct values for a certain field: https://cwiki.apache.org/confluence/display/solr/The+Stats+Component
Parameter stats.calcdistinct
:
If true, distinct values will be calculated and returned as "countDistinct" and "distinctValues" in the response. This calculation may be expensive for some fields, so it is false by default. If you'd only like to return distinct values for specific fields, you can also specify f..stats.calcdistinct, replacing with your field name, to limit the distinct value calculation to the required field.
To keep the load down, retrieve it as few times as possible and cache the results and only retrieve again when the data has changed.
If your index is slow in general you might want to have a look at the cache configuration and/or give SOLR more RAM (if you have the means).
Originally answered here (by me):
https://stackoverflow.com/a/26714447/621690
I don't know about RAM usage, but you might wanna try Field collapsing You will find the patch for Solr here.
精彩评论