Should I use a surrogate key (id= 1) or natural primary key (tag='sqlalchemy') for my sqlalchemy model?

2023-03-14 14:56 问答作者：

On the database side, I gather that a natural primary key is preferable as long as it's not prohibitively long, which can cause indexing performance problems. But as I'm reading through projects that use sqlalchemy via google code search, I almost always find something like:

class MyClass(Base):
    __tablename__ = 'myclass'
    i开发者_StackOverflowd = Column(Integer, primary_key=True)

If I have a simple class, like a tag, where I only plan to store one value and require uniqueness anyway, what do I gain through a surrogate primary key, when I'm using sqlalchemy? One of the SQL books I'm reading suggests ORM's are a legitimate use of the 'antipattern,' but the ORMs he envisions sound more like ActiveRecord or Django. This comes up a few places in my model, but here's one:

class Tag(Base):
    __tablename__ = 'tag'
    id = Column(Integer, primary_key=True) #should I drop this and add primary_key to Tag.tag?
    tag = Column(Unicode(25), unique=True) 
    ....

In my broader, relational model, Tag has multiple many-to-many relationships with other objects. So there will be a number of intermediate tables that have to store a longer key. Should I pick tag or id for my primary key?

Although ORMs or programming languages make some usages easier than others, I think that choosing primary key is a database design problem unrelated to ORM. It is more important to get database schema right on its own grounds. Databases tend to live longer than code that accesses them, anyways.

Search SO (and google) for more general questions on how to chose primary key, e.g.: https://stackoverflow.com/search?q=primary+key+natural+surrogate+database-design ( Surrogate vs. natural/business keys, Relational database design question - Surrogate-key or Natural-key?, When not to use surrogate primary keys?, ...)

I assume that Tag table will not be very large or very dynamic. In this case I would try to use tag as a primary key, unless there are important reasons to add some invisible to end user primary key, e.g.:

poor performance under real world data (measured, not imagined),
frequent changes of tag names (but then, I'd still use some unique string based on first used tag name as key),
invisible behind-the-scenes merging of tags (but, see previous point),
problems with different collations -- comparing international data -- in your RDBMS (but, ...)
...

In general I observed that people tend to err in both directions:

by using complex multi-field "natural" keys (where particular fields are themselves opaque numbers), when table rows have their own identity and would benefit from having their own surrogate IDs,
by introducing random numeric codes for everything, instead of using short meaningful strings.

Meaningful primary key values -- if possible -- will prove themselves useful when browsing database by hand. You won't need multiple joins to figure out your data.

Personally I prefer surrogate keys in most places; The two biggest reasons for this are 1) integer keys are generally smaller/faster and 2) Updating data doesn't require cascades. That second point is a fairly important one for what you are doing; If there are several many to many tables referencing the tag table, then remember that if someone wants to update a tag (eg, to fix a spelling/case mistake, or to use a more/less specific word, etc), the update will need to be done across all of the tables at the same time.

I'm not saying that you should never use a natural key -- If I am certain that the natural key will never be changed, I will consider a natural key. Just be certain, otherwise it becomes a pain to maintain.

Whenever I see people (over)using surrogate keys, I remember Roy Hann's blog articles regarding this topic, especially the second and the third article:

http://community.actian.com/forum/blogs/rhann/127-surrogate-keys-part-2-boring-bit.html
http://community.actian.com/forum/blogs/rhann/128-surrogate-keys-part-3-surrogates-composites.html

I strongly suggest people reading them as these articles come from a person who has spent few decades as database expert.

Nowadays surrogate key usage reminds me of early years of the 21 century when people used XML for literally everything, both where it did belong, and where it did not belong.

继续阅读：orm primary-key python sql

Should I use a surrogate key (id= 1) or natural primary key (tag='sqlalchemy') for my sqlalchemy model?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？