开发者

create database for each Django Model instance

I have one "master Model" with lots of "children Models" : As i know there will be a lot of data (through children), i would like to store dynamically each new instance of that MasterModel in a separated database (with all its children)

class MasterModel(models.Model):
    name = models.CharField()
    #db_connexion_name='mastermodel_1_db'

class ChildModelA开发者_运维技巧(models.Model):
    mastermodel = models.ForeignKey(MasterModel)
class ChildModelB(models.Model):
    mastermodel = models.ForeignKey(MasterModel)
    child = models.ManyToManyField(ChildModelA)
    child = models.ManyToManyField(ChildModelA)
    class ChildModelC(models.Model):
    ...

There is a lot of childs and relationships, but never between MasterModel objects

From now on, i guess i have to do : For each new MasterModel instance (by overriding the save() method):

  1. dynamically update settings.DATABASES dictionnary to add a new database : 'mastermodel_1_db', 'mastermodel_2_db'
  2. syncdb (to create schema / tables) on that db
  3. and then use a custom DatabaseRouter manage db transactions

like this:

class MyDatabaseRouter(object):
    def db_for_read() / db_for_write() / ... :
        # for any model, return the database
        # of the mastermodel related object
        # like :
        if hasattr(model,'mastermodel'):
            return model.mastermodel.db_connexion_name

Am i on the right way ?


This sounds like a massive case of premature optimization? Think about what you're doing. You're setting up a a system with a huge number of moving parts, with likely race conditions, that is complicated, and hard to maintain. All of this before you've even seen of the data will present performance issues.

If you are really convinced (and maybe have some numbers to back it up) that the size of the data you're dealing with will actually tax your system. These are the steps you should take:

  1. Better hardware. Hardware is way cheaper than developer time.
  2. Partitioning
  3. Correctly done sharding (what you're proposing isn't likely a correct sharding approach)

It's imporant to note that 1 and 2 together should be able to (under most circumstances) get you to tables with billions of rows.

You should have a read over at: http://www.flounder.com/optimization.htm. Particularly the last lines:

Optimization matters only when it matters. When it matters, it matters a lot, but until you know that it matters, don't waste a lot of time doing it. Even if you know it matters, you need to know where it matters. Without performance data, you won't know what to optimize, and you'll probably optimize the wrong thing.

The result will be obscure, hard to write, hard to debug, and hard to maintain code that doesn't solve your problem. Thus it has the dual disadvantage of (a) increasing software development and software maintenance costs, and (b) having no performance effect at all.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜