Is it efficient to have 90 percent of rows to have field set to NULL?
I ha开发者_StackOverflow社区ve a table in my MySQL (InnoDB) full with user items. Basically each row has a user_id field and other item properties like color. Then there is one more field called a link which holds the id of some other user's item, but in most cases (90 %) there is no item linked and thus the field is set to NULL
.
I was wondering would it be more efficient to make a new table which would hold the link information than having 90 % of 6 million rows to have the field link set to NULL
?
I'm using Hibernate.
Yes, it would be more efficient. It would make a very small difference.
Best is to do whatever is easiest for you and then change it when it becomes a real problem.
Yes, it would be more efficient and more normalized. Whenever I see a table with lots of nulls like this, I consider it a candidate for normalization. In this example, you could remove that column from the table entirely and it would be much more cleaner and easier to maintain. You would just create a junction table with a two user_ids that are foreign keys on the user items table.
as far as your logic is concerned, just consider storing only the tables with link data, and the code that calls if can do an "if not null" or equivalent to always know what to get. Don't store all those nulls if you can make better assumptions
It's going to occupy less space. But if you do a (left) JOIN for each query the performance is going to be worse.. especially if you have many rows and the table doesn't fit in memory. Then you need two disk seeks to fetch one record.
Update:
- JOIN takes some additional processing. It's going to be fast if you have indexes but still, you have to look up another record. And if you use InnoDB to support transactions the database has to maintain a version for the joined record.
- JOIN is bad for memory locality, now you have to look up a record which is at an entirely different memory location.
- As I mentioned if the data is not in the memory you need an additional disk seek. This is really bad.
精彩评论