开发者

Referential Integrity - how to configure SQLAlchemy?

Can someone help me get the big p开发者_如何学Pythonicture when it comes to configuring SQLAlchemy models, so referential integrity is ensured at all levels?

Going by the idea that referential integrity should be expressed and enforced by the DB, I have created a schema (currently in Postgresql) with all the constraints I think I need, thus giving me a database I trust will enforce referential integrity.

I then begin to build an app on top of this DB, using SQLAlchemy (0.7) in declarative mode.

Having searched and read a bit, I have learned that I can configure:

  • onupdate/ondelete rules on my Column() definitions.
  • cascade options on my relationship() definitions,

    and that these seem to operate on the session level in SQLAlchemy.

  • passive_deletes and passive_updates options for my relationship() definitions.

And that all these options have defaults.

But I am left confused as to how much I actually need to do with my SQLAlchemy models, to make sure SQLAlchemy doesn't get out of sync with the DB and its constraints during a session.

What exactly am I achieving, if I configure 'onupdate' etc. on my Columns() definitions in SQLAlchemy?

And for the cascade and passive_delete/passive_update rules I can configure on a relationship(). What do I need here, and why?

Or to rephrase my question: To what extend will SQLAlchemy be aware of the constraints configured in the DB schema, and to what extend (and how) do I have to repeat them in my models?

And are the anything else I should be aware of? :)


SQLAlchemy doesn't fundamentally need to know anything about the constraints your database has. If your database has the constraints you want configured, you're essentially done - your application is disallowed by the database from doing anything it's not supposed to.

A key theme of SQLAlchemy is that it is really only doing what you tell it to. So if you were to try persisting an object, SubWidget(), which in the database needs to have a reference to a parent Widget(), at the point at which SQLAlchemy flushes the data (i.e. emits INSERT statements), the operation would fail with a constraint violation, emitted by the database, and the transaction is rolled back.

So assuming an FK on "subwidget" referencing "widget", your application needs to ensure the data is in the correct structure. There are two ways to do this; one is that you hand-maintain those columns which contain foreign key references, and ensure they have the appropriate value at the point of INSERT or UPDATE. The other is that you'd use relationship() to manage the foreign key attribute, and you'd instead ensure that the creation of a SubWidget() object is accompanied by the operation of associating it with a parent Widget() object which you've created and/or acquired separately.

Regarding cascades, it's a great idea, though not required, to have ON DELETE CASCADE on those foreign keys where it applies. On the SQLAlchemy side, when using relationship() you usually want to give the ORM a hint that the database will cascade deletes via the passive_deletes flag ( http://www.sqlalchemy.org/docs/orm/collections.html?highlight=passive_deletes#using-passive-deletes ), however this is usually a performance enhancement; SQLAlchemy otherwise ensures that all objects represented on the dependent side of the relationship() are loaded into memory, and handled appropriately, which means one of either setting the foreign key attribute to NULL (the default), or marking the dependent object for deletion (which occurs by setting "cascade" to "all, delete-orphan", see http://www.sqlalchemy.org/docs/orm/session.html#cascades).

ON UPDATE cascade is less common as natural primary keys are not a common practice these days, since they don't actually perform as well as plain integer primary keys and can also be cumbersome in other ways. SQLAlchemy does support these as well however and they will generally take care of themselves, as SQLA assumes by default that update cascades are in place when a PK mutation takes place, see http://www.sqlalchemy.org/docs/orm/relationships.html#mutable-primary-keys-update-cascades for a detailed description of that.

Perhaps this is all easier to grok with a little experimentation, the basic idea is that SQLAlchemy is only emitting the SQL you're telling it to, even though many of its SQL behaviors are automated once configured up front. relationship() should be configured with details on how you'd like it to behave when data is persisted, modified, or deleted with respect to the constraints present in the database.


So, building on zzzeeks answer, and my own studying/tinkering after my initial question...

To make SQLAlchemy actively prevent that an in-session view of the DB state can divert from what the DB will allow when you flush/commit, you must mirror all the constraints found in the DB schema in your SQLAlchemy models.

And this is done via the Column definitions in the form of:

ForeignKey(..., onupdate='', ondelete='')
primary_key=True
unique=True

and so on, with the possible inclusion of __table_args__, like:

__table_args__ = (
        ForeignKeyConstraint(['id'], ['remote_table.id']),
        UniqueConstraint('foo'),
        )

For cases where the constraint spans multiple columns.

Whereas:

relationship()

and its related arguments, like:

cascade
cascade_backrefs
foreign_keys
passive_deletes
passive_updates

and so on, are a (important) convenience feature that allows you to work with your models with as little effort as possible, but ultimate aren't meant to be what prevents breaching referential integrity.

The relationship() feature can not express all the typical constraints in a database, while the Column() (and __table_args__) feature can.

On the other hand, configuring the relationship() with some of the arguments listed above (or going by default values where that makes most sense), will let SQLAlchemy perform tasks automatically, that ultimately can be said to be referential integerity related. And would otherwise normally have to be expressed by logic in the surrounding code.

Optimal configuration of relationship() will also in some cases avoid unnecessary SQL statements to be issued by SQLAlchemy.

Hope this summation is somewhat accurate...

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜