Database storage requirements and management for lots of numerical data
I'm trying to figure out how to manage and serve a lot of numerical data. Not sure an SQL database is the right approach. Scenario as follows:
- 10000 sets of time series data collected per hour
- 5 floating point values per set
- About 开发者_运维知识库5000 hours worth of data collected
So that gives me about 250 million values in total. I need to query this set of data by set ID and by time. If possible, also filter by one or two of the values themselves. I'm also continuously adding to this data.
This seems like a lot of data. Assuming 4 bytes per value, that's 1TB. I don't know what a general "overhead multiplier" for an SQL database is. Let's say it's 2, then that's 2TB of disk space.
What are good approaches to handling this data? Some options I can see:
- Single PostgreSQL table with indices on ID, time
- Single SQLite table -- this seemed to be unbearably slow
- One SQLite file per set -- lots of .sqlite files in this case
- Something like MongoDB? Don't even know how this would work ...
Appreciate commentary from those who have done this before.
Mongo is a key-value store; might work for your data but I don't have much experience.
I can tell you that PostgreSQL will be a good choice. It will be able to handle that kind of data. SQLite is definitely not optimized for those use-cases.
精彩评论