Creating a db table NULL best practices
Not sure on what the best practices are for dealing with NULL
values when I have a single table where two fields are only sometimes populated creating a lot of NULL
values in the rows.
Should the two fields be moved to a separate table creating two tables with no NULL
values?
A join across these two tables would just return a result that equals my original table with the NULL
'开发者_如何学Gos, so what's the point in that?
Seems pointless to separate them but I have been reading a bit about avoiding null's all together in the db.
Purely theoretically, a NULL is supposed to mean "unknown value". So - again, purely theoretically - you should design your tables when normalized so that you don't need to fill out NULL values to mean "not applicable for this row". However, this point has pretty much no relation to any practical consideration (design, performance, or query readability).
Practically, there are some performance considerations. You should normalize away very sparse data in the following cases:
There is material benefit from shortening the table (both IO wise and/or space wise). NULLs do take space, and the wider the rows the worse the performance. This is especially true when the table has a LOT of rows and there are many such sparse columns. For smaller table with only 2 such columns the benefits realized might not be worth the trouble of having an extra join.
Your queries have the column in question in the
WHERE
clause. IIRC, querying on a heavily NULL-ed column is rather inefficient.On the other hand, at certain point, having extra joins in the query might hurt the optimizer performance (at least it does so on Sybase once your joins have 10+ tables - from taking up CPU resources when optimizer runs to actually confusing the optimizer to pick a VERY bad plan). The solution is to avoid having too-many tables due to normalization (as in, don't bother splitting your 2 columns into a separate table), or forcing the query plan. The latter is obviously Bad Juju.
Nulls cause incorrect and inconsistent results in queries and generally increase code complexity due to the special handling needed in code that has to process them. For these reasons it usually makes sense to avoid or minimise nulls in your database designs. You don't need to use nulls in queries either - although SQL unfortunately makes them very difficult to avoid. However, by not using nulls in base tables you will ensure that your data model more accurately reflects reality and you will give database users more control over how they want nulls to be used.
As dportas implies in a comment, it is helpful to know what a null
value in a particular field means - not what it means in theory, but what it means in your data.
I think as long as you are clear what a null
means in your table, and if you are sure it only means one thing, you can make an informed pragmatic decision about whether to allow it.
Opinion: My rule of thumb is that nullable fields are fine but shouldn't multi-task
Nulls are critical to have in a database. I have never yet dealt with a database that didn't allow nulls that in the end wasn't much harder to query, much harder to maintain (how do you decide what value means I don't know the answer) and usually have more bad data. Yes nulls require special handling in queries, so do things like adding a much later date (1/1/9999) as the end date to avoid having a null.
The truth is, some data is just not known at the time the record is inserted. There is no substitute for null.
Now in you case, whther you should break out to two tables depnds alot on the width of the tables and the frequency you will need to query those nullable colulmns. I would not likely move a middlename column to another table even though I had a lot of nulls becasue it is always queried with the other information in the base table. I also would be unlikely to move an end date column. But if the columns were things that it is nice to knw byut not usually queried whenever you query the base data (such as Birthday, hair color, etc.), then a spearate table for only the records which contain the data may be fine. Remember though, when you query if you use an inner join, you eliminate all the records which don't have a value in the second table. If I would usually want all the records (like with middle name, I rarely query just to find the people with a middle name of 'Mary'), then I tend to keep them inthe same table unless the table is getting very wide and I usually don't want to query that information.
精彩评论