开发者

Solr for constantly updating index

I have a news site with 150,000 news articles. About 250 new articles are added daily to the database at an interval of 5-15 minutes. I understand that Solr is optimized for millions of records and my 150K won't be a problem for it. But I am worried the frequent updation will be a prob开发者_JS百科lem, since the cache gets invalidated with every update. In my dev server, cold load of a page takes 5-7 seconds to load (since every page runs a few MLT queries).

Will it help, if I split my index into two - An archive index and a latest index. The archive index will be updated once every day.

Can anyone suggest any ways to optimize my installation for a constantly updating index?

Thanks


My answer is: test it! Don't try to optimize yet if you don't know how it performs. Like you said, 150K is not a lot, it should be quick to build an index of that size for your tests. After that, run a couple of MLT queries from a different concurrent threads (to simulate users) while you index more documents to see how it behaves.

One setting that you should keep an eye on is auto-commit. Since you are indexing constantly, you can't commit at each document (you will bring Solr down). The value that you will choose for this setting will let you tune the latency of the system (how many times it takes for new documents to be returned in results) while keeping the system responsive.


Consider using mlt=true in the main query instead of issuing per-result MoreLikeThis queries. You'll save the roundtrips and so it will be faster.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜