Mysql Casting Performance Benchmark question / Data Architecture

2023-02-28 06:10 问答作者：

Currently I'm working off of a data set which is just redicolus; a flat file from multiple vendors which has no rhyme or reason; and sits at about 200 columns.. There are 15 which are common between those 200, which I have pulled out into another table.

From the other 185 columns they are a mix of varchar's, int's datetime's and multiple string values.

Right now I'm trying to decide how best to store these other 185 columns; as in a flat table currently it's just proving to scale super poorly. I have two solutions setup, however I do not know which one is better.

One is storing the meta data for each of the columns in separate tables (seen in image)

Mysql Casting Performance Benchmark question / Data Architecture

However it seems that using this method; it's going to be very difficult if down the road I need to do queries on items which lie within here.

THe other method I've thought of is throwing all the columns into a table which has id, value, datatype, than when doing queries cast the value to the da开发者_Python百科tatype ie:

 select * from foo where cast(col_to_query) as int < 5

however I'm not sure what the performance is like when doing things that way.

Question:

Which of these two methods would be better performance wise and which one would you recommend (or if there is a better option I would love to hear it).

Thank you

The first approach will scale even worse than a single table, and will be incredibly difficult to query to boot.

I would suggest using a single table, with all the columns in it, as a starting approach. You said it scales poorly though. What do you mean by that? How is it scaling poorly? Are queries taking a long time to return? Have you indexed the table properly for your queries? The number of columns doesn't often affect the time for queries to return significantly, other than if they're returning a huge amount of data. If that's the case, how you store it under the covers will have little effect on the query response time if all the time is being spent in transferring data between mysql and the client. Be sure that you're only selecting the columns you care about if this is the case. Don't do "select *".

Another option would be to use a table inheritance strategy. In this case, you would have one parent table that stores the 15 common attributes, and a "type" that would identify the type of records, based upon the file it came from, or you could call it the source. Then, create an extension table with a 1 to 0-1 mapping for each of the different files with the custom columns only for each specific file. This won't perform as well as one large table most likely, as you'll have to do joins, but it will help reduce the need for a whole bunch of columns on one table that are often null.

This would look something like this:

create table master (
  master_id int not null auto_increment primary key,
  type int,
  <field1> int,
  <field2> varchar(20),
  ...
);

create table file1_data (
  master_id int not null primary key,
  type int,
  <field16> int,
  <field17> varchar(20),
  ...
);

Query it like this:

select , , ... from master inner join file1_data on file1_data.master_id = master.master_id where ...

继续阅读：architecture performance

Mysql Casting Performance Benchmark question / Data Architecture

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？