Cassandra = Does ReadRepair prevent Scaling Reads?

2023-02-28 08:03 问答作者：

Cassandra has to option to enable "ReadRepair". A Read is send to all Replicas and if one is stale, it will be fixed/updated. But due to the fact, that all replicas receive the Read, there will be the point, when the nodes reach IO-Saturation. As always ALL replica nodes receive the read, adding further nodes will not help, as they also receive all reads (and will be saturated at once)?

Or does cassandra offer some "tunabililty" to configure that ReadRepair does only go to not all of the nodes (or offer any other "replication" that will allow true read scaling)?

thanks!! jens

Update: A Concrete exmaple, as I still do not understand how it will work in practice.

9 Cassandra "Boxes/Severs"
3 Replicas (N=3) => Every "Row" is written to 2 additinal Nodes = 3 Boxes hold the data in total)
ReadRepair Enabled
The Row in Question is (Lets say customer1) is highly trafficed

1.) The开发者_开发知识库 first Time I write the Row "Customer1" to Cassandra it will evantually be available on all 3 nodes.

2.) Now I query the system with 1000's of Requests of requests per second for Customer1 (and to make it more clear with any caching disabled).

3.) The Read will always be dispateched to all 3 nodes. (The first request (to the nearest node) will be a full request for data and the two additional requests will only be a "checksum request".)

4.) As we are queryingw with 1000's of requests, we reach the IO-limit of all Replicas! (The IO is the same on all 3 nodes!! (only the bandwith is much smaller on the checksum nodes).

5.) I add 3 further Boxes (so we have 12 Boxes in Total):

A) These Boxes does NOT have the Data yet (to help scale linearly). I first have to get the Customer1 Record to at least one of this new Boxes. =>This means I have to Change the replication Factor to 4 (OR is there any other option to get the data to another box?)

And now we have the same problem. The Replication Factor is now 4. And all 4 Boxes will receive the Read(Repair)Requst for this highly trafficed customer1 row. This does not scale this way. Scaling would only work if we have Copy that will NOT receive the ReadRepair Request.

What is wrong in my understanding?? My Conculsion: With Standard ReadRepair the System will NOT scale linearly (for a single highly trafficed row), as adding further boxes will also lead to the fact that these boxes also receive the ReadRepair requests (for this trafficed row)...

Thanks very much!!!Jens

Adding further nodes will help (in most situations). There will only be N read repair "requests" during a read, where N is the ReplicationFactor (number of replicas, nb. not the # of nodes in the entire cluster). So the new node(s) will only be included in a read / read repair if the data you request is included in the nodes key range (or is holding a replica of the data).

There is also the read_repair_chance tunable per ColumnFamily, but that is a more advanced topic and doesn't change the fundamental equation that you should scale reads by adding more nodes, rather than de-tuning read repair.

You could read more about replication and consistency from bens slides

继续阅读：cassandra nosql

Cassandra = Does ReadRepair prevent Scaling Reads?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？