开发者

Fastest full text search today?

spoiler :

This is just another Lucene vs Sphinx vs whatever,

I saw that all other threads were almost two years old, so decided to start again..

Here is the requirement :

data size : max 10 GB.

rows : nearly billions

indexing should be fast

searching should be under 0 m开发者_C百科s [ ok, joke... laugh... but keep this as low as possible ]

In today's world, which/what/how do I go about it ?

edit : I did some timing on lucene, and for indexing 1.8gb data, it took 5 minutes.

searching is pretty fast, unless I do a a*. a* takes 400 ~ 500 ms.

My biggest worry is indexing, which is taking loooonnnnggg time, and lot of resources!!


I have no experience other than with Lucene - it's pretty much the default indexing solution so don't think you can go too wrong.

10GB is not a lot of data. You'll be able to re-index it pretty rapidly - or keep it on SSDs for extra speed. And of course keep your whole index in RAM (which Lucene supports) for super-fast lookups.


Please check Lucene wiki for tips on improving Lucene indexing speed. This is quite succinct. In general, Lucene is quite fast (it is used for real-time search.) The tips will be handy to figure out if you are missing out on something "obvious."


My biggest worry is indexing, which is taking loooonnnnggg time, and lot of resources!!

Take a look at Lusql, we used it once, FWIW 100 GBdata from mysql on a decent machine took little more than an hour to index, on filesystem(NTFS)

Now if u add SSD or whatever ultra fast disk tecnnology, you can bring it down considerably

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜