Cassandra datastore size
I am using Cassandra to store my parsed site logs. I have two column families with multiple secondary indices. The log data by itself is around 30 gb in size. However, the size of the cassandra data dir is ~91g. Is there an开发者_如何学编程y way I can reduce the size of this store? Also, will having multiple secondary indices have a big impact on the datastore size?
Potentially, the secondary indices could have a big impact, but obviously it depends what you put in them! If most of your data entries appear in one or more indexes, then the indexes could form a significant proportion of your storage.
You can see how much space each column family is using JConsole and/or 'nodetool cfstats'.
You can also look at the sizes of the disk data files to get some idea of usage.
It's also possible that data isn't being flushed to disk often enough - this can result in lots of commitlog files being left on disk for a long time, occupying extra space. This can happen if some of your column families are only lightly loaded. See http://wiki.apache.org/cassandra/MemtableThresholds for parameters to tune this.
If you have very large numbers of small columns, then the column names may use a significant proportion of the storage, so it may be worth shortening them where this makes sense (not if they are timestamps or other meaningful data!).
精彩评论