Efficient retrieval of column families

2023-03-12 23:19 问答作者：

Recently I've come up against efficient retrieval of several columns from single row in single column family. Currently, I am using Pelops as Cassandra API. The question is what to do if I want to get columns from several ranges. It would be easy if I could get columns from the family according to few slices at once, but I can't.

For example I have a family with enourmous number of columns. Some of them have a common prefix, let's say "group/xxx", where xxx is an identifier. There are also a couple of columns named for example "a", "b", "c". Now, I want to get these columns together, so I have to define two slices an开发者_运维技巧d call getColumnsFromRow twice.

How to solve this problem in terms of efficiency? Does Cassandra somehow cache a column family which was recently retrieved and calling getColumnsFromRow for the second time will not make searching it again?

Because you have rolled your own compound column names, you basically have to issue multiple get_slice calls.

This is not a terribly big deal efficiency wise since these columns are in the same row and, if you chose your comparator correctly, should be a single disk seek. Subsequent queries to this same row should hit this portion of the table in the OS's disk cache (OS level, nothing to do with Cassandra).

Row caching was designed for small rows where the entire contents are accessed frequently (like a serialized object or similar). They will actually impose a substantial amount of memory pressure for large rows like this. I would recommend leaving row cache disabled for this CF.

If you find you need to, you can do some additional tweaking via making the following adjustments: - turn down read_repair_chance - enable 'result pinning': https://github.com/apache/cassandra/blob/cassandra-0.7.0/conf/cassandra.yaml#L229-236

This will let your 0S'S file system cache work more efficiently since the same hosts will be handling the same queries, and the subsequent slices will be operating on sections of the row ideally in the same SSTable and thus in FS cache.

(Shameless plug - but actually quite helpful in these situations) Also, consider a free download OpsCenter (http://www.datastax.com/opscenter), and watch the metrics for the column family as you experiment with the different options. This will give you an idea of the most efficient way to structure your queries specifically for your data.

Cassandra does have optional row caching but this is likely to cost a lot of memory if your rows are very large, so is probabably not advisable.

(Row caching is configured per-columnfamily using the rows_cached, row_cache_save_period_in_seconds, and preload_row_cache proeprties in your storage configuration)

http://wiki.apache.org/cassandra/StorageConfiguration says:

The row cache saves even more time, but must store the whole values of its rows, so it is extremely space-intensive. It's best to only use the row cache if you have hot rows or static rows.

继续阅读：cassandra performance

Efficient retrieval of column families

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？