Need help with riak-js
I'm a newbie with node.js and riak, trying to use riak-js. I wrote the following coffeescript, to create N entries with the squares of integers 1..N. The script works fine for N=10. If I put a console.log()
callback in the db.get()
I can print the squares of 1..10.
db = require('riak-js').getClient({debug:false})
N = 10
for i in [1..N]
db.save('Square', String(i), String(i*i))
for i in [1..N]
db.get('Square', String(i))
My problem is that when I put N=1000 it takes about 10 seconds for my script to complete. Is this normal? I was expecting something well under 1 sec. I have a single riak node on my local machine, an Acer Aspire 5740, i3 CPU and 4GB RAM, with Ubuntu 10.04. For a RAM-only store, I have set storage_backend开发者_开发技巧
in $RIAK/rel/riak/etc/app.config
to riak_kv_ets_backend
. The riak-admin status command confirms this setting.
Q1: Perhaps riak-js is setting some default disk-based backend for my bucket? How do I find out/override this?
Q2: I don't think it's a node.js issue, but am I doing something wrong in asynchronous usage?
A1: riak-js does not use any hidden setting, it is up to you to configure your Riak nodes.
A2: Your script seems fine, there's nothing you're doing wrong.
The truth is I haven't started benchmarking or seriously considering performance issues.
That said, every request is queued internally and issued serially. It makes the API simpler and you don't run into race conditions, but it has its limitations. Ideally I want to build a wrapper around riak-js that will take care of:
- Holding several instances to make requests in parallel
- Automatically reconnecting to other nodes in the cluster when one goes down
Your example runs in ~5sec on my MBP (using Bitcask).
=> time coffee test.coffee
real 0m5.181s
user 0m1.245s
sys 0m0.369s
Just as a proof of concept, take a look at this:
dbs = [require('riak-js').getClient({debug: false}), require('riak-js').getClient({debug: false})]
N = 1000
for i in [1..N]
db = dbs[i % 2]
db.save('sq', String(i), String(i*i))
for i in [1..N]
db = dbs[i % 2]
db.get('sq', String(i))
Results:
=> time coffee test.coffee
real 0m3.341s
user 0m1.133s
sys 0m0.319s
This will improve by using more clients hitting the DB.
Otherwise the answer is the Protocol Buffers interface, no doubt about it. I couldn't get it running with your example so I'll have to dig into it. But that should be lightning fast.
Make sure you're running the latest Riak (there have been many performance improvements). Also take into account a little overhead for CoffeeScript compilation.
Here is my test file:
db = require('../lib').getClient({debug:false})
N = if process.argv[2] then process.argv[2] else 10
for i in [1..N]
db.save('Square', String(i), String(i*i))
for i in [1..N]
db.get('Square', String(i))
After Compiling, I get the following times:
$ time node test1.js 1000
real 0m3.759s
user 0m0.823s
sys 0m0.421s
After running many iterations, my times were similar at that volume regardless of backend. I tested ets and dets. The os will cache your disk blocks on the first run at a particular volume but subsequent runs are faster.
Following up on frank06's answer, I would also look into connection handling. This is not an issue with Riak, so much as it is an issue in how riak-js sets up it's connections. Also note that in Riak, all nodes are the same so if you had a three node cluster you would create connections to all three nodes and round robin them in some fashion. Protobuf api is the way to go but requires some extra care in setting up.
精彩评论