Should I keep the size of stored fields in Solr to a minimum?

2023-03-24 07:00 问答作者：

I am looking to introduce Solr to po开发者_运维知识库wer the search for a business listing website. The site has around 2 million records.

There is a search results page which will display some key data for each result. I believe the data needed for this summary information is around 1KB per result.

I could simply index the fields needed for the search within Solr - but this means a separate database call for each result to populate the summary information. If Solr could return all of this data I would expect it to yield greater performance than ~40 database round-trips.

The concern is that Solr's memory usage would be too large (how might I calculate this?) and that indexing might take too long with the extra data.

You would benefit greatly to store those fields in Solr compared to the 40 db roundtrips. Just make sure that you marked the field as "not indexed" (indexed = false) in your schema config and maybe also compressed (compressed = true) (however this will of course use some CPU when indexing and retrieving).

When marking a field as "not indexed" no analyzers will process the field when indexing making it stored much faster than a indexed field.

It's a trade off, and you will have to analyze this yourself.

Solr's performance greatly depends on caching, not only of queries, but also of the documents themselves. Those caches depend on memory, and the bigger your documents are, the less you can fit in a fixed amount of memory.

Document size also affects index size and replication times. For large indices with master slave configurations, this can impact the rate at which you can update the index.

Ideally you should measure cache hit rates at different cache sizes, with and without the fields. If you can spend the memory to get a high enough cache hit rate with the fields, then by all means go for it. If you cannot, you may have to fetch the document content from another system.

There is a third alternative you didn't mention, which is to store the documents outside of the DB, but not in Solr. They should be stored in a format which is as close as possible to what you deliver with search results. The code which creates/updates the indices could create/update these documents as well. This is a lot of work, but like everything it comes down to how much performance you need and what you are willing to do to get it.

EDIT: For measuring cache hit rates and throughput, I've found the best test source is your current query logs. Take a day or two worth of live queries and run them against different indexes and configurations to see how well they work.

继续阅读：architecture performance search-engine solr

Should I keep the size of stored fields in Solr to a minimum?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？