开发者

Hbase performance

I am using Spring + Datanucleus JDO + Hbase. Hbase is on a fully distributed mode with two nodes. I am facing serious performance issues here.

My webapp can be considered as a pinger which just keeps pinging URLS and stores their response. Hnce my app runs multiple threads for INSERT into db. I have observed that once the number of concurrent writes exceeds around 20 , the inserts start taking a 开发者_开发问答lot of time (some take even 1000 secs). And when this happens READS start failing too and my webapp is not able to extract any data from the db (my webapp hangs). I am not much of a NoSQL db guy and hence do not know where to start looking for performance.

My major configurations are: Zookeeper quorum size: 1 Hbase regionservers: 2 Data Nodes: 2 hbase.zookeeper.property.maxClientCnxns: 400 replication factor:3

Do I need to increase the heap size for Hbase ? Should a high WRITE throughput have effect on READ ?

Am I doing something wrong with the configuration? It seems writing to a file would be faster that writing data to Hbase . This is my last shot at Hbase. Please help


The big problem that I see is you are running HBase on 2 nodes with a replication factor of 3 (actually in effect just 2 as there are only 2 nodes to replicate to). This means all writes must be replicated to both nodes. HBase really needs at least 5 or so nodes to get going.

It sounds like you are filling up your first region and it is splitting, during the split once the MemStore fills up you will start blocking. You should look into creating your table pre-split into multiple regions that will give you an even distribution of writes.

I recommend taking a look at the HBase book's chapter on performance, specifically the part on pre-splitting tables.

You should also use compression, make sure you get native compression working (gzip, lzo or snappy) - don't use the pure Java compression otherwise you'll be really really slow, the link discusses that a bit.


If you're going to write to HBase using multiple threads, you need to make sure you are reusing your HBaseConfiguration as often as possible. Otherwise, each thread makes a new connection and ZK will eventually stop offering connections until old ones close.

A quick solution is to let a singleton handle passing the configuration to your HTable objects. This should guarantee the same configuration is used and will minimize your concurrent connections.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜