开发者

Should I use a surrogate key (id= 1) or natural primary key (tag='sqlalchemy') for my sqlalchemy model?

On the database side, I gather that a natural primary key is preferable as long as it's not prohibitively long, which can cause indexing performance problems. But as I'm reading through projects that use sqlalchemy via google code search, I almost always find something like:

class MyClass(Base):
    __tablename__ = 'myclass'
    i开发者_StackOverflowd = Column(Integer, primary_key=True)

If I have a simple class, like a tag, where I only plan to store one value and require uniqueness anyway, what do I gain through a surrogate primary key, when I'm using sqlalchemy? One of the SQL books I'm reading suggests ORM's are a legitimate use of the 'antipattern,' but the ORMs he envisions sound more like ActiveRecord or Django. This comes up a few places in my model, but here's one:

class Tag(Base):
    __tablename__ = 'tag'
    id = Column(Integer, primary_key=True) #should I drop this and add primary_key to Tag.tag?
    tag = Column(Unicode(25), unique=True) 
    ....

In my broader, relational model, Tag has multiple many-to-many relationships with other objects. So there will be a number of intermediate tables that have to store a longer key. Should I pick tag or id for my primary key?


Although ORMs or programming languages make some usages easier than others, I think that choosing primary key is a database design problem unrelated to ORM. It is more important to get database schema right on its own grounds. Databases tend to live longer than code that accesses them, anyways.

Search SO (and google) for more general questions on how to chose primary key, e.g.: https://stackoverflow.com/search?q=primary+key+natural+surrogate+database-design ( Surrogate vs. natural/business keys, Relational database design question - Surrogate-key or Natural-key?, When not to use surrogate primary keys?, ...)


I assume that Tag table will not be very large or very dynamic. In this case I would try to use tag as a primary key, unless there are important reasons to add some invisible to end user primary key, e.g.:

  • poor performance under real world data (measured, not imagined),

  • frequent changes of tag names (but then, I'd still use some unique string based on first used tag name as key),

  • invisible behind-the-scenes merging of tags (but, see previous point),

  • problems with different collations -- comparing international data -- in your RDBMS (but, ...)

  • ...


In general I observed that people tend to err in both directions:

  • by using complex multi-field "natural" keys (where particular fields are themselves opaque numbers), when table rows have their own identity and would benefit from having their own surrogate IDs,

  • by introducing random numeric codes for everything, instead of using short meaningful strings.

Meaningful primary key values -- if possible -- will prove themselves useful when browsing database by hand. You won't need multiple joins to figure out your data.


Personally I prefer surrogate keys in most places; The two biggest reasons for this are 1) integer keys are generally smaller/faster and 2) Updating data doesn't require cascades. That second point is a fairly important one for what you are doing; If there are several many to many tables referencing the tag table, then remember that if someone wants to update a tag (eg, to fix a spelling/case mistake, or to use a more/less specific word, etc), the update will need to be done across all of the tables at the same time.

I'm not saying that you should never use a natural key -- If I am certain that the natural key will never be changed, I will consider a natural key. Just be certain, otherwise it becomes a pain to maintain.


Whenever I see people (over)using surrogate keys, I remember Roy Hann's blog articles regarding this topic, especially the second and the third article:

  • http://community.actian.com/forum/blogs/rhann/127-surrogate-keys-part-2-boring-bit.html

  • http://community.actian.com/forum/blogs/rhann/128-surrogate-keys-part-3-surrogates-composites.html

I strongly suggest people reading them as these articles come from a person who has spent few decades as database expert.

Nowadays surrogate key usage reminds me of early years of the 21 century when people used XML for literally everything, both where it did belong, and where it did not belong.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜