How scalable are automatic secondary indexes in Cassandra 0.7?
As far as I understand开发者_StackOverflow社区 automatic secondary indexes are generated for node local data.
In this case query by secondary index involve all nodes storing part of column family to get results (?) so (if i am right) if data is spread across 50 nodes then 50 nodes are involved in single query?
How far can this scale? Is this more scalable than manual secondary indexes (inverted index column family)? Few nodes or hundred nodes?
See Stu's answer from the ml http://www.mail-archive.com/user@cassandra.apache.org/msg10506.html
Yes, if you need to fetch all indexed rows, then the index queries involve all nodes. But this is actually more efficient, than building your own index! Details here.
However, if you lookup only a few rows, and each index entry maps to very many rows, then it's likely that the very first node is able to answer your question. Your query will then involve only one node. From the Apache mailing list:
The first node can answer the question as long as you've requested less rows than the first node has on it. Hence the "low cardinality" point in what you quoted.
(by Jonathan Ellis, here.)
(I also posted a question on the mailing list, a follow up question to your question, inquisitor, because I didn't really understand the answer to your question (linked in Schildmeijer's answer).)
精彩评论