Storing Scientific Data in a Relational Database

2023-02-17 12:37 问答作者：

I want to store hierarchical, two-dimensional scientific datasets in a relational database (MySQL or SQLite). Each dataset contains a table of numerical data with an arbitrary number of columns. I开发者_如何学JAVAn addition, each dataset can have one or more children of the same type associated with a given row of its table. Each dataset typically has between 1 and 100 columns and between 1 and 1.000.000 rows. The database should be able to handle many datasets (>1000) and reading/writing of data should be reasonably fast.

What would the best DB schema to store such kind of data? Is it reasonable to have a "master" table with the names, IDs and relations of individual datasets and in addition one table per dataset which contains the numerical values?

Is it reasonable to have a "master" table with the names, IDs and relations of individual datasets and in addition one table per dataset which contains the numerical values?

That's how I'd do it.

I'm not exactly sure how the 'arbitrary columns' thing is working, because data usually doesn't work like that. Regardless, it sounds like storing it as row,col,val might work nicely.

Honestly though, if you don't need to search through it (max, min, etc.), it might be better to use some kind of flat file.

An alternative setup that might be interesting is using SQLite, with a separate database file for each dataset, plus one master one.

Whatever you pick, how well it will work really depends on what you're going to do with the data.

You're going to end up trading off flexibility for performance, I think. You can hard-code your db schema, which it sounds like you want to avoid, but would give you the best performance, or

leave the schema determined at runtime, stored in a 'master' table, which increases your flexibility, but reduces your ability to enforce referential integrity and set data types.

for awhile, you could try both approaches until you have enough info about which will perform better for your task.

It's hard to be specific without understanding the problem domain, but if your data is inherently relational, use a relational model. If your data is not inherently relational, I wouldn't try to force it into a relational model for the sake of it - the fact that all dataset happen to have an ID doesn't mean those IDs are the same. Or even that they are suitable for use as a primary key.

I'd suggest starting by having each data set in its own table (or tables if there are child records), and create a master table if you need to.

I'd share zebediah49's question on "are you really going to use a database for this? Wouldn't flat files be better?"

We store a bunch of data like this in their own flat file. The header of the file contains enough information (timestamp, number of rows/cols...etc) so that it can be read. Then a meta information about this data is in the database. At minimum this is the file location, but could include other information about the data. For example we aggregate the data into proxy variables that summarize the details at a high level. Typically, this summary data is good enough, but when necessary we can read the file for all the details.

继续阅读：database sql storage

Storing Scientific Data in a Relational Database

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？