开发者

PostgreSQL table design -- how to avoid reusing IDs

I'm designing an app with a rather complex table design using the Postgres database, and am stuck on one point that I was 开发者_JS百科hoping someone could offer advice about.

I have several tables, each of which has a feature id (or fid). Different types of entities have different attribute schemas, so I have to create a different table for each type. However, I want to make sure that the fids are unique across all entity types.

If I have three entity types, Entity1/2/3, represented by the following 3 tables:

Entity1             Entity2             Entity3
    fid                 fid                 fid
    attribute1          attribute2          attribute3

How do I ensure that there are no duplicate fids anywhere in the system?

Thanks!


PostgreSQL (and Oracle for that matter) use objects called Sequences for sequencial value generation. Unlike MySQL and SQL Server's approach to sequencial value generation, sequences are not tied to the table. So you can define a single sequence (documentation link):

CREATE SEQUENCE your_seq 

This will create a sequence called your_seq, which will start at 1 and increment by 1 every time the next value is retrieved - see the documentation link if you want to set the minimum value and increment value/etc differently.

To use it, any INSERT statement needs to include:

NEXTVAL('your_seq')

...in the position in the INSERT statement to populate the appropriate column. IE:

INSERT INTO entity1
  (fid)
VALUES
  (NEXTVAL('your_seq'))

And, to automate this a little more, you would set NEXTVAL('your_seq') as the default value on all the tables that use it.


It sounds like Watsuimoto and MkV are thinking something similar - have a base table hold FIDs and some specific entity tables inherit from that. Watsuimoto mentioned it wasn't working... if you can get it to work, then I agree with you both that it's the right solution. And maybe simulating it with FKs on the integer IDs isn't tight enough to help Watusimoto's customers from assigning two Entities to the same EntityBase.

A potential fix for that would be to have a compound key with the entity's type to help identify it. One example:

EntityTypes     EntityBase     Entity1      Entity2
-------------   ------------   ---------    ---------
TypeName (PK)   EntityID       ID           ID
                EntityType     EntityType   EntityType
                CommonAtts     Attribute1   Attribute2
                FID

Constraints:
-----------------------------------------------------------------
EntityBase:
        PK... lots of options.  Probably PK(EntityID, EntityType)
        UNIQUE(FID)
        FK(EntityType) on EntityTypes(TypeName)
Entity1   :
        PK(ID)  (or PK(ID, EntityID))
        EntityType NOT NULL
        CHECK(EntityType = "Entity1")  (e.g., it is constant)
        FK(EntityType) on EntityTypes(TypeName)
        FK(ID, EntityType) on EntityBase (ID, EntityType)
Entity2   :  <Ditto>

You've got a lot of flexibility here. You could set up per-type FIDs. You could make EntityIDs unique per type or unique across all Entities. You could make EntityBase have an ID separate from EntityID. You might even be able to make EntityType some kind of computed column, or default it, so you don't have to write a value to it.

If that's not your cup of tea because of the EntityType overhead, then I reluctantly offer this:

Entity1      Entity2           Features
---------    ---------   ...   ----------
ID (PK)      ID (PK)           FID (PK, arbitrary)
Attribute1   Attribute2        Entity1ID (FK)
                               Entity2ID (FK)
                               Entity3ID (FK)

Constraints:
-----------------------------------------------------------------
Features  :
        One and only one EntityID is NOT NULL

People who like to think of tables as "entity records" usually don't like this approach. It's unwieldy for more than a half dozen entities. But it is correct, and does allow you to keep your single-integer row IDs if you want.

When it comes to issues like this, I go to Ken Downs blog and look around. He's got some pretty good thoughts on relational design. That would be my first suggestion if I could find the article he posted on this topic. This article is the closest I could find.


I suggest using guids for your fid field. This way you can be sure that you won't have duplicate fids, and is more elegant than either (i) maintaining a 'highest fid' somewhere and querying it every time you do an insert operation, or (ii) putting code in that checks all your fids in all your tables every time you do an insert operation.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜