开发者

SQL - Can this relation have a primary key?

The relationship between bands and members, 'MemberOf', includes a member id (Mid), band id (Bid), 'StartYear', 'EndYear' and 'Instrument' played. Mid and Bid are both foreign keys to Member and Band relatio开发者_开发技巧ns respectively but I am struggling to identify the primary key in this situation for the 'MemberOf' relation.

Note: start year can be null if a member was in a band from the start and end year can also be null if a member is still in the band.

In this situation, start and end year cannot be used as PK's as they may contain null values, but the member id and band id is not enough information to be unique - e.g. the same member returns to the same band at a later date.

Does this mean no primary key exists? Thanks.


This is a common problem. To see what's going on, we need to disentangle the relational concept 'primary key' from the SQL construct PRIMARY KEY.

The relational concept 'primary key' is a unique identifier for the rows in a table. In the context of your table, the band and the member are clearly part of the primary key, and so too is the period during which the member played for the band. A single member could play for a specific band in, say, 2003-2005, go off on a solo career for 2006-2007, and rejoin his original band for 2008-2011. You'd need two entries for the member in the table, one for each of the two periods.

Ideally, you'd be able to say 'startYear' and 'endYear' form a single 'memberFor' period column and the primary key is (bid, mid, memberFor) and for a single combination of bid+mid, the memberFor values represent disjoint ranges: that is, for two rows R1, R2, the value R1.memberFor does not overlap or touch R2.memberFor. Example overlap: R1.memberFor = (2003-2005), R2.Memberfor = (2004-2006). Example touch: R1.memberFor = (2003-2005), R2.MemberFor = (2005-2007).

So much for the theory.

In practice, SQL does not support time periods, where a time period in this context has a start and a duration. The INTERVAL types do not qualify because they represent duration without the start, and the DATE, TIME, TIMESTAMP types do not qualify because they lack the duration (ignoring issues of quantization which are not germane to the matter at hand). Clearly, you can also represent a period by (duration, end) and also by (start, end), and in the latter case, you can have open, closed, open-closed or closed-open ranges for the start and end.

Because SQL does not support the requisite type, much less the necessary checking options, you have to do the checking yourself, and it is complex to do it. The SQL primary key only looks for equality of values. Unconstrained, it means that if you create the primary key on (bid, mid, start) or (bid, mid, end) or (bid, mid, start, end), you can end up with unwanted data in the table which satisfies the SQL primary key but not the conceptual primary key.

Bid     Mid     Start   End
  1       1      2003   2005   - α
  1       1      2004   2006   - β
  1       1      2004   2007   - γ
  1       1      2008   2011   - δ

Consider the row tagged α as correct. If your SQL primary key is on (bid, mid, start), then the row tagged β should not be allowed because it overlaps with α; similarly for γ. However, SQL only prevents β and γ both appearing; the one entered first will be OK and the one entered second will be rejected. If γ were allowed, then ∂ should not be allowed because it touches γ. Similarly, if the SQL primary key were on (bid, mid, start, end), all four rows would be allowed into the table, but clearly they should not be. And if the SQL primary key was on (bid, mid, end), all the rows above would be allowed in.

So, you would have to augment the criteria for the SQL primary key with a complex query that filters out unwanted rows. This would like be stored procedure used in an insert trigger for the table and in an update trigger.

…details left as exercise for reader…

In your table, you have a year-based granularity. You could probably, therefore, get away with a revised table that stored just (mid, bid, memberInYear), and have one row for each band member for each year when they were a member of the given band. This reduces to a large quantum time period where the SQL primary key on the three columns enforces the constraint. However, if you change the granularity of the table to record the start and end dates (down to the day), then clearly the revised design does not work well.

So, you should probably go with an SQL primary key on (bid, mid, start) backed up by a stored procedure invoked from insert and update triggers that enforces the non-overlap and non-touching criteria.


Note: start year can be null if a member was in a band from the start

Well, that ridiculous requirement is the biggest part of your problem. Store the year.

{BandId, MemberId, StartYear} as the primary key.

That won't accommodate members who start in February, quit in March, and start again in July, though.


You are right that the dates cannot be considered part of the PK as they are data that will change over time. I think you have two options:

1) Split the MemberOf record into a parent child pair with the child table, maybe called MemberOfYears, containing Mid, Bid, and a Year and the parent, MemberOf, just becomes Mid & Bid. So one MemberOfYears record would be included of each Year the member was a member of that particular band.

2) Add an artificial key to the MemberOf relationship table and use it as the primary.


One possible solution would be to store the StartYear value as the year the band was started. That way you could use (MId, BId, StartYear) as a primary key assuming members can only join and quit once per year (as suggested by your design).

To test for "original member" you compare MemberOf.StartYear with Band.StartYear for equality.

Otherwise no, you don't have a valid primary key and need to add some kind of disambiguating column (JoinNumber or something), which seems like undesirable extra effort.


SQL - Can this relation have a primary key?

  • Everything in human history happened at some point in time, so StartYear can not be NULL -- really.

If you introduce IsCurrent instead the EndYear then there are no NULLs at all. If you do choose to keep the EndYear, it is common practice to load it with some date in far future, like 3000-01-01. And finally, if you do keep and leave EndYear NULL -- it is not tragic, the only problem being that it may be hard to distinguish between missing data and the current membership.

Note: StartDate may be more appropriate than the StartYear.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜