How to increment a counter in Cassandra?

2023-01-13 02:37 问答作者：

I'd like to use Cassandra to store a counter. For example how many times a given page has been viewed. The counter will never decrement. The value of the counter does not need to be exact but it should be accurate over time.

My first thought was to store the value as a column and just read the current count, increment it by one and then put it back in. However if another operation is also trying to increment the counter, I think the final value would just be the one with the latest timestamp.

Another thought would be to store each page load as a new column in a CF. Then I could just run get_count() on that key and get the number of columns. Reading through the documentation, it appears that it is not a very efficient operation at all.

Am I approaching the pr开发者_运维技巧oblem incorrectly?

Counters have been added to Cassandra 0.8

Use the incr method increment the value of a column by 1.

[default@app] incr counterCF [ascii('a')][ascii('x')];
Value incremented.
[default@app] incr counterCF [ascii('a')][ascii('x')];
Value incremented.

Describe here: http://www.jointhegrid.com/highperfcassandra/?p=79

Or it can be done programatically

CounterColumn counter = new CounterColumn();
ColumnParent cp = new ColumnParent("page_counts_by_minute");
counter.setName(ByteBufferUtil.bytes(bucketByMinute.format(r.date)));
counter.setValue(1);
c.add(ByteBufferUtil.bytes( bucketByDay.format(r.date)+"-"+r.url)
            , cp, counter, ConsistencyLevel.ONE);

Described here: http://www.jointhegrid.com/highperfcassandra/?cat=7

[Update] Looks like counter support will be ready for primetime in 0.8!

I definitely wouldn't use get_count, as that is an O(n) operation which is ran every time you read the "counter." Worse than it being just O(n) it may span multiple nodes which would introduce network latency. And finally, why tie up all that disk space when all you care about is a single number?

For right now, I wouldn't use Cassandra for counters at all. They are working on this functionality, but it's not ready for prime time yet.

https://issues.apache.org/jira/browse/CASSANDRA-1072

You've got a few options in the mean time.

1) (Bad) Store your count in a single record and have one and only one thread of your application be responsible for counter management.

2) (Better) Split the counter into n shards, and have n threads manage each shard as a separate counter. You can randomize which thread is used by your app each time for stateless load balancing across these threads. Just make sure that each thread is responsible for exactly one shard.

3a) (Best) Use a separate tool that is either transactional (aka an RDBMS) or that supports atomic increment operations (memcached, redis).

[Update.2] I would avoid using a distributed lock (see memcached and zookeeper mutexes), as this is very intolerant to node failure or network partitioning if improperly implemented.

What I ended up doing was using get_count() and caching the result in a caching ColumnFamily.

This way I could get a general guess at the count but still get the exact count whenever I wanted.

Additionally, I was able to adjust how stale the data I was willing to accept on a per request basis.

We are going to address a similar problem by keeping the current value of a counter in a distributed cache (for example - memcached). When the counter is updated, we will store its value in Cassandra. Therefore even if some cache node fails, we will be able to get the value from the database.

This solution is not perfect. However data such a visit counter are not very sensitive so minor inconsistencies are allowed in my opinion.

Interestingly enough, I do not see anyone mentioning the possibility to count on a per app computer basis. Say your app runs on 5 machines named a1, a2, ... a5. Then you can have a lock on a per machine basis (i.e. a file you open with O_EXCL or use lock to wait for other instances to be done with the counter) and add either one row per machine or one column depending on your implementation. Something like

machine_lock();
this_column_family[machine-name][my-counter] += 1;
machine_unlock();

That way, you get one counter per machine. When you need the total, you just read a1, a2, ... a5 and sum them.

total = 0;
foreach(machines as m) {
  total += this_column_family[m][my-counter];
}

(this is pseudo code that would more or less work with libQtCassandra.)

This way you avoid a lock that locks all the nodes and yet you still get safe/consistent counting (obviously the read + sum is not perfect and it only gives you an approximation of the total, but it still remains consistent.)

I'm not too sure whether what Ben Burns pointed out in regard to having n shards and n threads would be the same thing, but it doesn't sound exactly like it to me.

And since 0.8.x, you can use the Cassandra counters which is certainly a lot easier to do, although it may not always fit your needs.

继续阅读：cassandra

How to increment a counter in Cassandra?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？