开发者

Mysql Casting Performance Benchmark question / Data Architecture

Currently I'm working off of a data set which is just redicolus; a flat file from multiple vendors which has no rhyme or reason; and sits at about 200 columns.. There are 15 which are common between those 200, which I have pulled out into another table.

From the other 185 columns they are a mix of varchar's, int's datetime's and multiple string values.

Right now I'm trying to decide how best to store these other 185 columns; as in a flat table currently it's just proving to scale super poorly. I have two solutions setup, however I do not know which one is better.

One is storing the meta data for each of the columns in separate tables (seen in image)

Mysql Casting Performance Benchmark question / Data Architecture

However it seems that using this method; it's going to be very difficult if down the road I need to do queries on items which lie within here.

THe other method I've thought of is throwing all the columns into a table which has id, value, datatype, than when doing queries cast the value to the da开发者_Python百科tatype ie:

 select * from foo where cast(col_to_query) as int < 5

however I'm not sure what the performance is like when doing things that way.

Question:

Which of these two methods would be better performance wise and which one would you recommend (or if there is a better option I would love to hear it).

Thank you


The first approach will scale even worse than a single table, and will be incredibly difficult to query to boot.

I would suggest using a single table, with all the columns in it, as a starting approach. You said it scales poorly though. What do you mean by that? How is it scaling poorly? Are queries taking a long time to return? Have you indexed the table properly for your queries? The number of columns doesn't often affect the time for queries to return significantly, other than if they're returning a huge amount of data. If that's the case, how you store it under the covers will have little effect on the query response time if all the time is being spent in transferring data between mysql and the client. Be sure that you're only selecting the columns you care about if this is the case. Don't do "select *".

Another option would be to use a table inheritance strategy. In this case, you would have one parent table that stores the 15 common attributes, and a "type" that would identify the type of records, based upon the file it came from, or you could call it the source. Then, create an extension table with a 1 to 0-1 mapping for each of the different files with the custom columns only for each specific file. This won't perform as well as one large table most likely, as you'll have to do joins, but it will help reduce the need for a whole bunch of columns on one table that are often null.

This would look something like this:

create table master (
  master_id int not null auto_increment primary key,
  type int,
  <field1> int,
  <field2> varchar(20),
  ...
);

create table file1_data (
  master_id int not null primary key,
  type int,
  <field16> int,
  <field17> varchar(20),
  ...
);

Query it like this:

select , , ... from master inner join file1_data on file1_data.master_id = master.master_id where ...

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜