开发者

Sphinx Building Index Improvement

Hey guys, I have a question about Sphinx. I use Sphinx to index the full-text searches for my sites, and it works like a dream. At this point in time it takes about 30 minutes to create the ind开发者_Go百科exes for all my databases. This is fine as I only run the indexing script once every hour.

But the databases are getting bigger quickly, and soon I'm afraid the databases will be so big, it wont be able to do the process in 1 hour. Of course I can run it only once every 2 hours, but this is not ideal.

Now my question: Is sphinx rebuilding the entire indexes every time the script runs, or does it only add the newest items that were recently added to the database (since the last index run)?

My feeling says, that it totally rebuilds it.

In case that is true, is it also possible to ONLY the items that were not in it already? Could this make the indexing time a lot shorter?


See Delta index updates: http://sphinxsearch.com/docs/current.html#delta-updates

The idea is to maintain two indexes: main and delta. You only need to build the index for newly added/update contents into the delta index, and merge back to the main index periodically.


split your index

main and delta is the most conventional if you can identify easily new/updated rows.

if not split can serve you to reindex many "parts" in same time.

with an index of 180M (randomly updated) where I had average of 5K new rows every hour.

My solution was

22 partial indexes (because 24 core in server) 20 partial indexes equals 2 "delta" indexes

by script I compute start and end id of each part with this rule

last delta = start from first id created today other delta = start from first id created yesterday end with first id created today

for partial index it was basically based on

  • START position-1 * ((first id created yesterday)/20 )
  • END position * ((first id created yesterday)/20 )

last delta was rebuild in few seconds, 20 first block was build in 10 minutes

before, with the monolitic version the full index was rebuild in 4 hours (but there was only 90M row and sphinx 0.9.9)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜