开发者

Fast, scalable string lookup

I have a set of 5 million strings. These are currently stored in a single column MySQL table. My application has to perform lookups and check if a given string is in the set. This can of course be done with a HashSet (in Java). But instead of building a custom solution, I was wondering if there are any existing, widely used, proven solutions that do this? It seems like a common scenario. The solution should be scalable (the set might increase beyond 5 million), have failover (so probably distributed) and perform well under a huge 开发者_如何学运维number of requests. Any suggestions?

Update: My app can also query to check if a given set of strings is present in the global (the 5 million one) set.


You can try Trie or Patricia-trie.The second is more memory efficient.Also here you can find a comparison of 2 data structures [Trie,TreeSet],In-memory database and their performance.


Try memcached, a high-performance, distributed memory object caching system. You lookup using key/value hashes. Facebook uses memcached as do many other highly scalable sites. Need to store more strings? Just add more memcached instances to the cluster. Plus you can use in a 2-tier caching setup where you first query memcached, if cache miss then query the full database.

Have you considered adding column indexing to your MySQL database? Hash, b-tree and r-tree are supported.

MySQL can also be replicated and clustered for high scalability.


While a Trie might be the best solution, binary search on the sorted list of strings should also perform well run time wise.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜