开发者

Large data store (nosql or not)

I have l开发者_如何学Pythonarge amounts of scientific data that I need to store (150 TB+ starting data) and I want to know the best way to store the data (nosql or RDBMS etc...)

Any tips......

James


Answer this question to choose from NoSQL or a RDBMS : "Are my data structured in relationships?"


This really depends on what you need to do with the data on a later time. If the data is a collection of a few very large files then the a normal file system would be ok. If you need to be able to search and analyse the data then a database might be the best solution.

I am working with large datasets as well in a scientific environment. Most of this data is tabular and when we started we stored every datapoint is a table. We found it to be much easier in the end to zip the tables and store this in a binary blob into the database. In a separate table we stored the metadata about this tables.


There are special db's for scientific data: http://www.dbms2.com/2009/09/12/xldb-scid/


Does it have to be one database type? Part of NoSQL means one size does not fit all, so why not two or more NoSQL? How about one column store and one graph database?


You should look at NetCDF and HDF5. Also, consider using PyTables for accessing and extracting the data.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜