How do you use the Cassandra tool sstableloader?
I'm trying to use the sstableloader to load data into an existing Cassandra ring, but cant figure out how to actually get it to work. I'm trying to run it on a machine that has a running cassandra node on it, but when I run it I get an error saying that port 7000 is already in use, which is the port the running Cassandra node is using for gossip.
So does that mean I can only use sstableloader on a machine that is in the same network as the target cassandra ring, but isn开发者_运维百科't actually running a cassandra node?
Any details would be useful, thanks.
Played around with sstableloader, read the source code, and finally figured out how to run sstableloader on the same machine that hosts a running cassandra node. There are two key points to get this running. First you need to create a copy of the cassandra install folder for sstableloader. This is becase sstableloader reads the yaml file to figure out what ipaddress to use for gossip, and the existing yaml file is being used by Cassandra. The second point is that you'll need to create a new loopback ipaddress (something like 127.0.0.2) on your machine. Once this is done, change the yaml file in the copied Cassandra install folder to listen to this ipaddress.
I wrote a tutorial going more into detail about how to do this here: http://geekswithblogs.net/johnsPerfBlog/archive/2011/07/26/how-to-use-cassandrs-sstableloader.aspx
The Austin Cassandra Users Group just had a presentation on this: http://www.slideshare.net/alex_araujo/etl-with-cassandra-streaming-bulk-loading/
I have used the sstableloader utility provided in cassandra-0.8.4 to successfully load the sstables into cassandra.From Some of the issues i have faced i have following tips
If you are running it on single machine,you have to create a copy the cassandra installation folder and have to run sstable-loader from this folder.Also change the listen address,rpc address also provide the ip address of running cassandra as seeds in cassandra.yaml file of this copied one.Check if the cluster name in both the cassandra.yaml file is same.
These sstables have to be in a directory whose name is the name of the keyspace
It requires a directory containing a cassandra.yaml configuration file in the classpath.
Note that the schema for the column families to be loaded should be defined beforehand
For Reference SEE: Using Cassandra SStableloader
For Reference SEE: Using Cassandra SStableloader for bulkloading the data into cassandra http://ramuprograms.blogspot.com/2014/07/bulk-loading-data-into-cassandra-using.html
If you are looking to do this in Java see below utility class:
BulkWriterLoader
List<String> argList = new ArrayList<>();
argList.add("-v");
argList.add("-d");
argList.add(params.hosts);
argList.add("-f");
argList.add(params.cassYaml);
argList.add(params.fullpath);
LoaderOptions options = LoaderOptions.builder()
.parseArgs(argList.stream().toArray(String[]::new))
.build();
try
{
BulkLoader.load(options);
}
catch (BulkLoadException e)
{
e.printStackTrace();
}
...
The code will also generate the sstable files using the CQLSSTableWriter class.
Things improve and the whole procedure of using sstableloader is much easier including a easier way to generate sstables with CQLSSTableWriter.
For all the details: https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/tools/toolsBulkloader.html
精彩评论