100s of Tables vs Single, Large Table

2023-02-17 05:44 问答作者：

I’m attempting to solve a problem where we are analyzing a substantial amount of data from a table. We need to pull in certain subsets of this data and analyze it. As is, I believe that it would be best to multithread it and bring in as much data as possible initially and perform various computations on each region. Let’s assume that each subset of data to analyze is denoted as S1, S2, … So There will be a thread for each. After performing the calculations, some visualization may be created as well and the results will need to be stored back into the database as there may potentially be many gigabytes worth of data in the analysis results. Let’s assume that the results are denoted by R1, R2, …

Although this is a little vague, I am wondering whether we should create a table for each of R1, R2, etc or store all of the r开发者_JS百科esults in a single table? It will likely be the case that we will want multiple threads storing results at the same time (recall threads for S1, S2) so if there is a single table, I need to ensure that multiple threads can access it at the same time. If it helps, when the data for R1, R2, etc is needed again, all of it will be pulled out and in a certain order that would be easy to maintain if there were a table for each of R1, R2, etc. Also, I was thinking that we could have a single object for each table that manages requests to that particular results table if we go that route. Essentially, I would like the object to be like a bean that only loads in data from that database as necessary (too much to keep in memory at once). Another point is that we are using InnoDB as our storage engine in case that makes any difference as to whether multiple threads can access a particular table.

So, with this bit of information, would it be best to create a set of tables for the results or one for each region of results (possibly 100s)?

Thanks

You could, but then you have to manage 100 tables. And getting statistics for the whole set will be that much more difficult.

If the data can be easily partitioned to different subsets that do not intersect, the database should not be locking rows, especially if you are just doing reads and processing in your application. In such a case you don't need to partition the table into hundreds of tables and each thread in your application can be used independently.

this sounds like a good map reduce candidate. That's assuming that you are going to perform the same calculation on the whole set and just want to speed up the process.

Have you considered using something like MongoDB? you can write your own map reduce aggregations in it.

Map reduce: http://en.wikipedia.org/wiki/MapReduce

mongo : http://www.mongodb.org/display/DOCS/MapReduce

Mongo does support update in place and it's a lockless eventually consistent store.

继续阅读：analysis database innodb multithreading

100s of Tables vs Single, Large Table

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？