开发者

NoSql With My Own Custom Binary Files?

Originally, I had to deal with just 1.5[TB] of data. Since I just needed fast write/reads (without any SQL), I designed my own flat binary file format (implemented using python) and easily (and happily) saved my data and manipulated it on one machine. Of course, for backup purposes, I added 2 machines to be used as exact mirrors (using rsync).

Presently, my needs are growing, and there's a need to build a solution that would successfully scale up to 20[TB] (and even more) of data. I would be happy to continue using my flat file format for storage. It is fast, reliable and gives me everything I need.

The thing I am concerned about is replication, data consistency etc (as obviously, data will have to be distributed -- not all data can be stored on one machine) across the network.

Are there any ready-made solutions (Linux / python based) that would allow me to keep using my file format for storage, yet would handle the other components that NoSql solutions normally开发者_JS百科 provide? (data consistency / availability / easy replication)?

basically, all I want to make sure is that my binary files are consistent throughout my network. I am using a network of 60 core-duo machines (each with 1GB RAM and 1.5TB disk)


Approach: Distributed Map reduce in Python with The Disco Project

Seems like a good way of approaching your problem. I have used the disco project with similar problems.

You can distribute your files among n numbers of machines (processes), and implement the map and reduce functions that fit your logic.

The tutorial of the disco project, exactly describes how to implement a solution for your problems. You'll be impressed about how little code you need to write and definitely you can keep the format of your binary file.

Another similar option is to use Amazon's Elastic MapReduce


Perhaps some of the commentary on the Kivaloo system developed for Tarsnap will help you decide what's most appropriate: http://www.daemonology.net/blog/2011-03-28-kivaloo-data-store.html

Without knowing more about your application (size/type of records, frequency of reading/writing) or custom format it's hard to say more.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜