cassandra node limitations
I am looking for if cassandra has limitations of node hardware spec like what could be the max storage per node if there is any such limitation.
I intend to use couple of nodes with 48TB storage (2TB X 24 hard drives 7200rpm) per node with some good dual xeon processor.
I have lo开发者_开发知识库oked up for such limitations if exists any but didn't find any material about this issue. And guys why there is so much less buzz about cassandra recently while its getting mature and its up 0.8 version while most of articles/blogs are related to 0.6v only.
Cassandra distributes its data by row, so the only hard limitation is that a row must be able to fit on a single node.
So the short answer is no.
The longer answer is that you'll want to make sure that you're setting up a separate storage area for your permanent data and your commit logs.
One other thing to keep in mind is that you'll still run into seek speed issues. One of the nice things about Cassandra is that you don't need to have a single node with that much data (and in fact its probably not well advised, you're storage will outpace your processing power). If you use smaller nodes (hard drive space wise) then your storage and processing capabilities will scale together.
There are some notes here about large data set considerations.
48 TB of data per node is probably way too much. It will be much better to have more nodes with smaller amounts of data. Periodically you need to run nodetool repair, which involves reading all the data on the machine. If you are storing many terabytes of data on a machine, this will be very painful.
I would limit each node to around 1TB of data.
See How much data per node in Cassandra cluster?
which suggests that between 1-10 TB per node is sensible, depending on your application. Cassandra will probably still work with 48TB, but not optimally.
Do you intend to use replication factor of 1, or 2 (if you have 2 nodes as stated above)?
Some operations (repair, compaction) may be extremely slow with that much data on a single node.
You should also be careful using large amounts of RAM with Cassandra. RAM is great for caching the data in SSTables, but giving the JVM too much heap space is counter-productive. Don't give the JVM much more than 12 GB of heap space, otherwise garbage collection will take too long and hinder performance. This is another reason why having more smaller nodes is better in Cassandra.
Datastax, who is the principal vendor recommends 3 to 5 To per node
See here:
https://docs.datastax.com/en/cassandra/1.2/cassandra/architecture/architecturePlanningHardware_c.html
精彩评论