What are methods of programmatically detecting many-to-many relationships in a RDMBS?
I'm currently busy making a Python ORM which gets all of its information from a RDBMS via introspection (I would go with XRecord if I was happy with it in other respects) — meaning, the end-user only tells which tables/views to look at, and the ORM does everything else automatically (if it makes you actually write something and you're not looking for weird things and dangerous adventures, it's a bug).
The major part of that is detecting relationships, provided that the database has all relevant constraints in place and you have no naming conventions at all — I want to be able to have this ORM work with a database made by any crazy DBA which has his own views on what the columns and tables should be named like. A开发者_Python百科nd I'm stuck at many-to-many relationships.
First, there can be compound keys. Then, there can be MTM relationships with three or more tables. Then, a MTM intermediary table might have its own data apart from keys — some data common to all tables it ties together.
What I want is a method to programmatically detect that a table X is an intermediary table tying tables A and B, and that any non-key data it has must belong to both A and B (and if I change a common attribute from within A, it should affect the same attribute in B). Are there common algorithms to do that? Or at least to make guesses which are right in 80% of the cases (provided the DBA is sane)?
If you have to ask, you shouldn't be doing this. I'm not saying that to be cruel, but Python already has several excellent ORMs that are well-tested and widely used. For example, SQLAlchemy supports the autoload=True
attribute when defining tables that makes it read the table definition - including all the stuff you're asking about - directly from the database. Why re-invent the wheel when someone else has already done 99.9% of the work?
My answer is to pick a Python ORM (such as SQLAlchemy) and add any "missing" functionality to that instead of starting from scratch. If it turns out to be a good idea, release your changes back to the main project so that everyone else can benefit from them. If it doesn't work out like you hoped, at least you'll already be using a common ORM that many other programmers can help you with.
Theoretically, any table with multiple foreign keys is in essence a many-to-many relation, which makes your question trivial. I suspect that what you need is a heuristic of when to use MTM patterns (rather than standard classes) in the object model. In that case, examine what are the limitations of the patterns you chose.
For example, you can model a simple MTM relationship (two tables, no attributes) by having lists as attributes on both types of objects. However, lists will not be enough if you have additional data on the relationship itself. So only invoke this pattern for tables with two columns, both with foreign keys.
So far, I see the only one technique covering more than two tables in relation. A table X is assumed related to table Y, if and only if X is referenced to Y no more than one table away. That is:
"Zero tables away" means X contains the foreign key to Y. No big deal, that's how we detect many-to-ones.
"One table away" means there is a table Z which itself has a foreign key referencing table X (these are easy to find), and a foreign key referencing table Y.
This reduces the scope of traits to look for a lot (we don't have to care if the intermediary table has any other attributes), and it covers any number of tables tied together in a MTM relation.
If there are some interesting links or other methods, I'm willing to hear them.
精彩评论