riak backup solution for a single bucket
What开发者_运维技巧 are your recommendations for solutions that allow backing up [either by streaming or snapshot] a single riak bucket to a file?
Backing up just a single bucket is going to be a difficult operation in Riak.
All of the solutions will boil down to the following two steps:
List all of the objects in the bucket. This is the tricky part, since there is no "manifest" or a list of contents of any bucket, anywhere in the Riak cluster.
Issue a GET to each one of those objects from the list above, and write it to a backup file. This part is generally easy, though for maximum performance you want to make sure you're issuing those GETs in parallel, in a multithreaded fashion, and using some sort of connection pooling.
As far as listing all of the objects, you have one of three choices.
One is to do a Streaming List Keys operation on the bucket via HTTP (e.g. /buckets/bucket/keys?keys=stream
) or Protocol Buffers -- see http://docs.basho.com/riak/latest/dev/references/http/list-keys/ and http://docs.basho.com/riak/latest/dev/references/protocol-buffers/list-keys/ for details. Under no circumstances should you do a non-streaming regular List Keys operation. (It will hang your whole cluster, and will eventually either time out or crash once the number of keys grows large enough).
Two is to issue a Secondary Index (2i) query to get that object list. See http://docs.basho.com/riak/latest/dev/using/2i/ for discussion and caveats.
And three would be if you're using Riak Search and can retrieve all of the objects via a single paginated search query. (However, Riak Search has a query result limit of 10,000 results, so, this approach is far from ideal).
For an example of a standalone app that can backup a single bucket, take a look at Riak Data Migrator, an experimental Java app that uses the Streaming List Keys approach combined with efficient parallel GETs.
The Basho function contrib has an erlang solution for backing up a single bucket. It is a custom function but it should do the trick.
http://contrib.basho.com/bucket_exporter.html
As far as I know, there's no automated solution to backup a single bucket in Riak. You'd have to use the riak-admin
command line tool to take care of backing up a single physical node. You could write something to retrieve all keys in a single bucket and using low r values if you want it to be fast but not secure (r = 1).
Buckets are a logical namespace, all of the keys are stored in the same bitcask structure. That's why the only way to get just a single node is to write a tool to stream them yourself.
精彩评论