Multitenant DB: Why put a TenantID column in every table?
Every tutorial I've seen about M开发者_StackOverflow中文版ultitenant database models tells you to put the TenantID in every single table:
zoos
-------
id
zoo_name
tenant_id
animals
-------
id
zoo_id
animal_name
tenant_id
However, this seems redundant to me. Why not add the tenant_id
column to just the zoos
table and exploit the foreign key relationship between zoos
and animals
?
Do you add tenant_id
to every table just to keep the joins from getting too crazy? Is it a safeguard against bugs? A performance consideration?
If one of your key design considerations is security--specifically, one client can no way no how no when access another client's data--then, depending on how you implement this security, sticking that qualifying column in every table may be necessary. One such tactic described here requires building a view on every table; assuming each table contains a tenantId column, then if properly configured each view could contain a "WHERE tenantId = SUSER_SID()" clause (and of course you configure the database so that clients can only access the views).
Another factor (as in my current job) is loading warehouse data (ETL). Tables are partitioned on tenantId (we use table partitioning, but partitioned views would also work), and data can be easily loaded or unloaded for a client without seriously impacting any other client.
But as ever, there's a lot of "it depends" involved. If there is no clear and present need, and a very low likelihood of future need, then normalize that column out. Just realize that it's more a devise of physical implementation than of conceptual or logical database design.
Its there for convenience and performance - in terms of normalisation you're absolutely right, it only needs to go in at the top. The problem then becomes that to get to some data (say zoo -> animal -> food -> supplier) you have to have horribly complex joins on what are notionally very simple queries.
So in the real world one has to compromise - question then becomes where and to what extent.
See this article Maybe Normalizing Isn't Normal - and its conclusion:
As the old adage goes, normalize until it hurts, denormalize until it works
as a place to start exploring the subject
If I had tenantID at the top of the hierarchy (i.e. at the zoo level) you have several issues to consider.
- The top of the hierarchy can never change, for example if you need to add a node on the tree above the zoo level (say regions -> zoos -> animals) then it will force a re-org every time.
- For certain queries, you will be forced to start at the top of the hierarchy, i.e. give me a list of all animals available will force you to start at the top of the tree
- Why not use schemas ? Each tenant is isolated within their own schema. This will also separate the data-sets nicely.
The first thing that springs to mind is that it's slower to look up animals > zoos > tenants
than simply animals > tenants
. And most likely this is a lookup you will do often (for example, "get all animals for a certain tenant, regardless of zoo").
For small to mid-sized applications you can get away with a more normalized structure, but for the sake of efficiency, you should go with extraneous data (and generally speaking, multitenancy applications are not small). Just make sure it doesn't go "out of sync", which is a risk that comes with having redundant data.
To answer your last paragraph, the reason is performance, pure and simple. Joins are no bad thing; they help you keep a piece of data in one place rather than three. It's definitely not to prevent bugs. Adding a tenant_id
field to more tables will increase the risk of bugs (although for an id that never changes, it wouldn't be as much of an issue).
Well, Bob may own a giraffe in zoo No1, while Joe may own a lion in the same zoo. They are not supposed to look at each others' data.
The reason N1 is for security.
Security need to be a strong concept in multi-tenant application.
Suppose that you give a user the ability to modify a Animal. You create a form whith a select that show the zoo fo the current tenant. What happen if the user hack the form and pass a zoo id of another tenant?
The animal will be moved to another zoo of another tenant!!
This is real pain in a multi tenant app!
精彩评论