
What is a good Bulk data loading tool for Cassandra

I'm looking for a tool to load CSV into Cassandra. I was hoping to use RazorSQL for th开发者_JAVA百科is but I've been told that it will be several months out.

What is a good tool?


1) If you have all the data to be loaded in place you can try sstableloader(only for cassandra 0.8.x onwards) utility to bulk load the data.For more details see:cassandra bulk loader

2) Cassandra has introduced BulkOutputFormat bulk loading data into cassandra with hadoop job in latest version that is cassandra-1.1.x onwards. For more details see:Bulkloading to Cassandra with Hadoop

I'm dubious that tool support would help a great deal with this, since a Cassandra schema needs to reflect the queries that you want to run, rather than just being a generic model of your domain.

The built-in bulk loading mechanism for cassandra is via BinaryMemtables: http://wiki.apache.org/cassandra/BinaryMemtable

However, whether you use this or the more usual Thrift interface, you still probably need to manually design a mapping from your CSV into Cassandra ColumnFamilies, taking into account the queries you need to run. A generic mapping from CSV-> Cassandra may not be appropriate since secondary indexes and denormalisation are commonly needed.

  • For Cassandra 1.1.3 and higher, there is the CQL COPY command that is available for importing (or exporting) data to (or from) a table. According to the documentation, if you are importing less than 2 million rows, roughly, then this is a good option. Is is much easier to use than the sstableloader and less error prone. The sstableloader requires you to create strictly formatted .db files whereas the CQL COPY command accepts a delimited text file. Documenation here: http://www.datastax.com/docs/1.1/references/cql/COPY

  • For larger data sets, you should use the sstableloader.http://www.datastax.com/docs/1.1/references/bulkloader. A working example is described here http://www.datastax.com/dev/blog/bulk-loading.





验证码 换一张
取 消

