High performance querying - Suggestions please
Supposing that I have millions of user profiles, with hundreds of fields (name, gender, preferred pet and so on...).
You want to make searches on profiles.
Ex.:All profiles that has age between x and y, loves butterflies, hates chocolate....
With database would you choose?
Suppose that You have a Facebook like load. Speed is a must. Open Source preferred.
I've read a lot about Cassandra, HBase, Mongo, Mysql... I just can'开发者_如何学运维t decide.....
Its all about using effective indexes. If you have a special query, make an index for that query.
Ex. make an index age_lovebutterflies_hateschoclate
If you have a high-traffic site like facebook, you would need more power than only sql optimizing. E.g memcaching, implemented a search-engine like vespa or lucene/solr implemented as clusters. Loadbalancers, multiple servers with 64gb ram, raid disks, and lots of other server technologies...
Problem with databases like mysql, postgresql, sqlite and oracle is that indexes ok for static searches, but they are not flexible. Ex if you would like to combine searches over columns that are not indexed, no indexes will be used. Ex. if you include an additional parameter like gender or maybe another like location.. you would have to create more indexes... lucene/solr and a real search engine is much more effective in this way, as you can make as many combination you would like... All you have to think about is that a column is indexed, not what other columns its indexed together with...
So, Facebook...It's a long way to go dude ;)
精彩评论