开发者

what s the catch with cassandra? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 11 years ago.

ok. I was reading about cassandra and every article i read mentioned that writes in cassandra are very "fast" due to eventual consistency.

I setup cassandra on a linux box, created a schema, and created a client via c# using fluent cassandra client. well , it didnt work cause i wasnt able to access to remote cassandra instance via the fluent cassandra client.

So i installed cassandra on windows, created schema etc.

Next, I inserted 1 million entries to cassandra, which took about 12 minutes. The client and server are on the same machine which is quad core with 8GB ram.

This isnt fast. I did similar test with MongoDB which took 4 minutes to开发者_如何学运维 write 1 million documents to it.

I did a smiliar test with ObjectivityOODBMS, it took 30 seconds to insert 1 million objects.

What s the catch with cassandra, it wasnt fast according to my test?

Would it behave different on a linux server with different client like Java?


I haven't used Cassandra beyond doing a bit research on it, but have used MongoDB. Hopefully these thoughts/notes will help.

On a standalone machine, using mongoimport I was able to loaded about 24 million documents into MongoDB in about 6 minutes. Your 4 minutes to write 1 million does seem slow - factors could be: disk speed / how you are inserting - e.g. if you insert 1 doc at a time, then it will be slower. Especially if you use SafeMode (I don't know if Cassandra has the same kind of thing). You should instead insert via one of the batch APIs (e.g. InsertBatch on the C# driver). The same kind of thing would be true for Cassandra (1 by 1 = slow, batched inserts = faster). It's this ability to easily add nodes to scale out writes/reads that really gives you the full (and fair) picture of these technologies.

Obviously on a standalone machine, you will have contention which could be a factor.

The thing to note, is that technologies like MongoDB and Cassandra make it very easy to scale out. e.g. in MongoDB terminology, you can scale your writes (i.e. increase throughput) by using sharding. Especially when you get to larger data volumes, being able to have a dozen nodes all accepting writes at the same time is obviously going to help the IO situation and increase writes. Likewise, you can scale reads with replica sets.

In summary, my question would be how are you inserting those documents - is it done in the most efficient/batched manner?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜