Representing a many-many and a one-many as a single denormalized view, as a many-to-many, or another way?
I have a model question, followed by a view question in a separate SO post, which I will link to when I create it. I'm making two posts so I can do two acceptances.
Now, let's start by supposing that I have the following schema. This is a simplified version of a real schema I'm designing, with irrelevant columns ommitted. Names have been changed to protect the innocent, and my job.
tree
has the following attributes.
tree.id
tree.dob
tree.height
tree.forestid
forestid
is a foreign key to the forest
table, which is just a way to aggregate multiple tree
rows. It has an id
column and some metadata columns.
tree
and forest
both have potentially many treedata
rows. treedata
contains
treedata.value
treedata.treeid
treedata.forestid
treedata.treeid and treedata.forestid are constrained so that one of the two must be null.
If tree.forestid is not null, then the relationship between treedata and tree is many-to-many, and forest is the linker table. Otherwise, the relationship between tree and treedata is one to many. It is very important to my application that the user is able to group trees together into forests on the fly via the UI, and to set a treedata.value for the entire forest, but also be able to work with individual trees. Now, I can think of a couple of ways to represent this. One is to say that every tree has a forestid and is a forest of at least size 1. Then, the relationship is always many to many. Another is to provide a denormalized view along the lines of
select tree.*, treedata.*
from tree, treedata
where tree.id = treedata.treeid
union
select tree.*, treedata.*,
from tree, treedata
where tree.forestid = treedata.forestid.
Yet a third way would be to have a forestid
column in tree
and to drop the forest
table entirely. Down that path I see a difficulty in getting ACID guarantees with respect to proper incrementing of tree.forestid
. There's also the possibility that forests should contain their own metadata. I'd love to hear more ways to represent this, and also to get the opinion of more seasoned database people with respect to which way is preferable, and top marks if you explain why you think so by citing examples from your own experience.
Response to Martin Dom's suggestion of a TreeComposite table:
Thanks for your reply. I wanted to give the suggestion of a TreeComposite a day to simmer before responding. First of all, your way does model the relationship I'm expressing in normal form, so yes, I think that you do understand the question. However, I think I made a silly mistake by naming my tables Tree and Forest: because Forests don't need to be recursively composable into each other. They aren't computer science trees. They're just bark-and-twig trees. The parentid model, though it stil开发者_运维百科l represents what I need in normal form (it's a generalization of what I need), and though it has the advantage that Trees and Forests are "the same thing" now, which is on the surface a complexity win, I fear that in the soil it would be a tangled mess.
The problem is that whether or not they're called the same thing in my model, they're not not the same thing to my controller or view. E.G., a TreeComposite with child nodes will probably have at least attribute for which each node will have a distinct value. In that case, I need to use a different widget in my view to display multiple values for the attribute. Put another way,I need to be able to display each TreeComposite that is its own parent as a single row, and what that row looks like depends on whether the TreeComposite has children.
So the first thing I have to do after I extract a TreeComposite from my model is decide whether it's "really" a Tree or a Forest. Why do that, when I can store it in normal form as Tree and Forest directly, thereby making my View and Controllers simpler without hurting my model? Searching for TreeData is also complicated by the parent-child relationship. I have to connect through N number of TreeComposites until I find the root node, and then search for TreeData pointing at the root node. This shreds data locality. Meanwhile, if I have a Tree with a ForestID, and I want that Tree's data, I never need to look at the Forest table at all. I can do a direct foreignkey <-> foreignkey join to TreeData. (where Tree.ForestID = TreeData.ForestID
).
How about a composite kind of structure? Let's say you have table TreeComposite which has a TreeComposite.ParentId foreign key that points to another TreeComposite. A TreeData can only reference a single TreeComposite so the constraint that it only reference either a forest or an individual tree is maintained.
The only problem I see is that you could have multiple levels of composition, which may or may not make sense depending on the problem you're trying to solve.
Your relationship between TreeComposite and TreeData would be many-to-many via an intermediate table, if I'm understanding your constraints properly.
Using this model you can treat a tree and a forest as though they were the same kind of object, and apply metadata with this in mind.
精彩评论