what s the catch with cassandra? [closed]
ok. I was reading about cassandra and every article i read mentioned that writes in cassandra are very "fast" due to eventual consistency.
I setup cassandra on a linux box, created a schema, and created a client via c# using fluent cassandra client. well , it didnt work cause i wasnt able to access to remote cassandra instance via the fluent cassandra client.
So i installed cassandra on windows, created schema etc.
Next, I inserted 1 million entries to cassandra, which took about 12 minutes. The client and server are on the same machine which is quad core with 8GB ram.
This isnt fast. I did similar test with MongoDB which took 4 minutes to开发者_如何学运维 write 1 million documents to it.
I did a smiliar test with ObjectivityOODBMS, it took 30 seconds to insert 1 million objects.
What s the catch with cassandra, it wasnt fast according to my test? Would it behave different on a linux server with different client like Java?
I haven't used Cassandra beyond doing a bit research on it, but have used MongoDB. Hopefully these thoughts/notes will help.
On a standalone machine, using mongoimport I was able to loaded about 24 million documents into MongoDB in about 6 minutes. Your 4 minutes to write 1 million does seem slow - factors could be: disk speed / how you are inserting - e.g. if you insert 1 doc at a time, then it will be slower. Especially if you use SafeMode (I don't know if Cassandra has the same kind of thing). You should instead insert via one of the batch APIs (e.g. InsertBatch on the C# driver). The same kind of thing would be true for Cassandra (1 by 1 = slow, batched inserts = faster). It's this ability to easily add nodes to scale out writes/reads that really gives you the full (and fair) picture of these technologies.
Obviously on a standalone machine, you will have contention which could be a factor.
The thing to note, is that technologies like MongoDB and Cassandra make it very easy to scale out. e.g. in MongoDB terminology, you can scale your writes (i.e. increase throughput) by using sharding. Especially when you get to larger data volumes, being able to have a dozen nodes all accepting writes at the same time is obviously going to help the IO situation and increase writes. Likewise, you can scale reads with replica sets.
In summary, my question would be how are you inserting those documents - is it done in the most efficient/batched manner?
精彩评论