开发者

File I/O on NoSQL - especially HBase - is it recommended? or not?

I'm new at NoSQL and now I'm trying to use HBase for file storage. I'll store files in HBase as binary.

I don't need any statistics, only file s开发者_运维问答torage.

IS IT RECOMMENDED? I worry about I/O speed.

The reason why I use HBase for a storage is I have to use HDFS, but I can't build Hadoop on a client computer. Because of it, I was tring to find some libraries which helps the client to connect to HDFS to get files. But I couldn't find it, and I just choose HBase instead of a connection library.

In this situation, what should I do?


I don't know about Hadoop, but MongoDB has GridFS which is designed for distributed file storage which enables you to scale horizontally, get replication for "free" and so on.

http://www.mongodb.org/display/DOCS/GridFS

There will be some overhead with storing files in chunks in MongoDB, so if your load is low to medium, and you need low response times, you will probably be better off with using the file system directly. Performance will also vary between different driver implementations.


I think that capability to mount HDFS as regular file system should help you. http://wiki.apache.org/hadoop/MountableHDFS


You certainly can use HBase to store files. It is perhaps not ideal, and based on your file size distribution you may want to tweak some of the settings. Compared with HDFS, it is probably a much better alternative for large numbers of files.

Settings to look out for:

  • max region size: You will likely want to turn this up to 4GB
  • max cell size: you will want to set this to 0 to disable this limit

You may also want to look at other kinds of alternatives (maybe even MapR).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜