开发者

mysql empty columns

I'm on an optimization kick, at the moment.

I tend to use开发者_StackOverflow multiple tables, so I don't have empty columns.

My question is, are empty columns a big deal? I'm not talking for space. I'm referring to speed of indexing, data retrieval, etc...

My bet example is when I have a simple customers table, and some columns are not always filled. Like email, dob, ssn, or pic. I'd say most of the time they are not filled in.

That causes me to create a new table to house just the ancillary data. but would it really make a difference if I put these columns in the same table with the rest of the customer info?

If I do this, then there will be many records with empty columns. Which causes me to wonder how much this affects performance when the record count is huge.


If you're on an optimization kick, my advice is to get off it :-)

Optimization is something that should be done in response to a performance problem, not a whim. If there's no performance problem, all optimization is wasted effort.

Empty fields rarely make a large difference to data retrieval in a properly designed schema since most queries should, as much as possible, use indexes only for deciding which rows to get. Once the rows are discovered, that's when you go to the table to get the actual data.

And speed of indexing won't change just because the column is stored in another table. If it needs to be indexed, then it needs to be indexed.

I prefer my schema to be as simple as possible (while still mostly following 3NF) so as to avoid unnecessary joins.


If you store them as a variable-length field (eg: VARCHAR), empty columns won't take up as much (any?) space. This comes at the expense of slower lookups compared to tables that only have fixed length fields.

I personally think it's fine to have empty columns, even when you have many of them (also known as a sparse table). Some databases even have optimizations for sparse tables. If you start having many extra tables your logic becomes more complex, and it makes it harder to maintain referential integrity.

What you could do in your customers table is to have one extra customer_profiles table with a 1:1 relationship with the customers table. Store the essential information in customers and the rest (ie: stuff you don't need every time you look up a customer) in the customer_profiles table.


Using an external table to host ancillary data is one of the options, just like nullable columns.

It can save some space but requires more resources to join the table.

If your model is a sparse matrix (lots of attributes, most of which will not be defined), than the cost of storing and scanning these attributes may even overweight the cost of a JOIN.

With an additional table, however, you won't be able to create an index which would cover two attributes from different tables.

Relational model usually allows using more than one method to implement an ER model, and that's exactly what illustrates it.

You may want to read this article:

  • What is entity-relationship model?
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜