Are there performance advantages by splitting a Django model/table into two models/tables?
In SO question 7531153, I asked the proper way to split a Django model into two—either using Django's Multi-table Inheritance or explicitly defining a OneToOneField.
Based Luke Sneeringer's comment, I'm curious if there's a performance gain from splitting the model in two.
The reason I was thinking about splitting the model in two is because I ha开发者_开发知识库ve some fields that will always be completed, while there are other fields that will typically be empty (until the project is closed).
Are there performance gains from putting typically empty fields, such as actual_completion_date
and actual_project_costs
, into a separate model/table in Django?
Split into Two Models
class Project(models.Model):
project_number = models.SlugField(max_length=5, blank=False,
primary_key=True)
budgeted_costs = models.DecimalField(max_digits=10, decimal_places=2)
submitted_on = models.DateField(auto_now_add=True)
class ProjectExtendedInformation(models.Model):
project = models.OneToOneField(CapExProject, primary_key=True)
actual_completion_date = models.DateField(blank=True, null=True)
actual_project_costs = models.DecimalField(max_digits=10, decimal_places=2,
blank=True, null=True)
Actually, quite the opposite. Any time multiple tables are involved, a SQL JOIN will be required, which is inherently slower for a database to perform than a simple SELECT query. The fact that the fields are empty is meaningless in terms of performance one way or another.
Depending on the size of the table and the number of columns, it may be faster to only select a subset of fields that you need to interact with, but that's easy enough in Django with the only
method:
Project.objects.only('project_number', 'budgeted_costs', 'submitted_on')
Which produces something akin to:
SELECT ('project_number', 'budgeted_costs', 'submitted_on') FROM yourapp_project;
Using separate models (and tables) only makes sense for the purposes of modularization -- such that you subclass Project
to create a specific kind of project that requires additional fields but still needs all the fields of a generic Project
.
For your case, if there's some info that's available only when it's closed, I'd indeed advise making a separate model.
Joins aren't bad. Especially in your case the join will be faster if you have all rows in one table and much fewer rows in the other one. I've worked with databases a lot, and in most cases it's a pure guess to tell if a join will be better or worse. Even a full table scan is better than using an index in many cases. You need to look at the EXPLAINs, if performance is a concern, and profile the Db work if possible (I know Oracle supports this.) But before performance becomes an issue, I prefer quicker development.
We have a table in Django with 5M rows. And we needed a column that would have been not null only for 1K rows. Just altering the table would have taken half a day. Rebuilding from scratch also takes a few hours. We've chosen to make a separate model.
I've been to a lecture on Domain Driven Design in which the author explained that it is important, especially in development of a new app, to separate models, to not stuff everything in one class.
Let's say you have a CargoAircraft class and PassengerAircraft. It's so tempting to put them in one class and work "seamlessly", isn't it? But interactions with them (scheduling, booking, weight or capacity calculations) are completely different.
So, by putting everything in one class you force yourself to bunch of IF clauses in every method, to extra methods in Manager, to harder debugging, to bigger tables in the DB. Basically you make yourself spend more time developing for the sake of what? For only two things: 1) fewer joins 2) fewer class names.
If you separate the classes, things go much easier:
- clean code, no ugly ifs, no .getattr and defaults
- easy debugging
- more mainainable database
hence, faster development.
精彩评论