开发者

Data Modeling: Logical Modeling Exercise

In trying to learn the art of data storage I have been trying to take in as much solid information as possible. PerformanceDBA posted some really helpful tutorials/examples in the following posts among others: is my data normalized? and Relational table naming convention. I already asked a subset question of this model here.

So to make sure I understood the concepts he presented and I have seen elsewhere I wanted to take things a step or two further and see if I am grasping the concepts. Hence the purpose of this post, which hopefully others can also learn from. Everything I present is conceptual to me and for learning rather than applying it in some production system. It would be cool to get some input from PerformanceDBA also since I used his models to get started, but I appreciate all input given from anyone.

As I am new to databases and especially modeling I will be the first to admit that I may not always ask the right questions, explain my thoughts clearly, or use the right verbage due to lack of expertise on the subject. So please keep that in mind and feel free to steer me in the right direction if I head off track.

If there is enough interest in this I would like to take this from the logical to physical phases to show the evolution of the process and share it here on Stack. I will keep this thread for the Logical Diagram though and start new one for the additional steps. For my understanding I will be building a MySQL DB in the end to run some tests and see if what I came up with actually works.

Here is the list of things that I want to capture in this conceptual model. Edit for V1.2

  1. The purpose of this is to list Bands, their members, and the Events that they will be appearing at, as well as offer music and other merchandise for sale
  2. Members will be able to match up with friends
  3. Members can write reviews on the Bands, their music, and their events.
    • There can only be one review per member on a given item, although they can edit their reviews and history will be maintained.
    • BandMembers will have the chance to write a single Comment on Reviews about the Band they are associated with. Collectively as a Band only one Comment is allowed per Review.
    • Members can then rate all Reviews and Comments but only once per given instance
  4. Members can select their favorite Bands, music, Merchandise, and Events
  5. Bands, Songs, and Events will be categorized into the type of Genre that they are and then further subcategorized into a SubGenre if necessary. It is ok for a Band or Event to fall into more then one Genre/SubGenre combination.
  6. Event date, time, and location will be posted for a given band and members can show that they will be attending the Event. An Event can be comprised of more than one Band, and multiple Events can take place at a single location on the same day
  7. Every party will be tied to at least one address and address history shall be maintained. Each party could also be tied to more then one address at a time (i.e. billing, shipping, physical)
  8. There will be stored profiles for Bands, BandMembers, and general members.

So there it is, maybe a bit involved but could be a great learning tool for many hopefully as the process evolves and input is given by the community. Any input?

Data Modeling: Logical Modeling Exercise

EDIT v1.1 In response to PerformanceDBA

U.3) That means no merchandise other than Band merchandise in the database. Correct ? That was my original thought but you got me thinking. Maybe the site would want to sell its own merchandise or even other merchandise from the bands. Not sure a mod to make for that. Would it require an entire rework of the Catalog section or just the identifying relationship that exists with the Band? Attempted a mod to sell both complete albums or song. Either way they would both be in electronic format only available for download. That is why I listed an Album as being comprised of Songs rather then 2 separate entities.

U.5) I understand what you bring up about the circular relation with Favorite. I would like to get to this “It is either one Entity with some form of differentiation (FavoriteType) which identifies its treatment” but how to is not clear to me. What am I missing here?

u.6) “Business Rules This is probably the only area you are weak in.”

Thanks for the honest response. I will readdress these but I hope to clear up some confusion in my head first with the responses I have posted back to you.

Q.1) Yes I would like to have Accepted, Rejected, and Blocked. I am not sure what you are referring to as to how this would change the logical model?

Q.2) A person does not have to be a User. They can exist only as a BandMember. Is that what you are asking?

Minor Issue

Zero, One, or More…Oops I admit I forgot to give this attention when building the model. I am submitting this version as is and will address in a future version. I need to read up more on Constraint Checking to make sure I am understanding things.

M.4) Depends if you envision OrderPurchase in the future. Can you expand as to what you mean here?

Data Modeling: Logical Modeling Exercise

EDIT V1.2 In response to PerformanceDBA input...

Lessons learned.

  1. I was mixing the concept of Identifying / Non-Identifying and Cardinality (i.e. Genre / SubGenre), and doing so inconsistently to make things worse.
  2. Associative Tables are not required in Logical Diagrams as their many-to-many relationships can be depicted and then expanded in the Physical Model.
  3. I was overlooking the Cardinality in a lot of the relationships
  4. The importance of reading through relationships using effective Verb Phrases to reassure I am modeling what I want to accomplish.

U.2) In the concept of this model it is only required to track a Venue as a location for an Event. No further data needs to be collected. With that being said Events will take place on a given EventDate and wil开发者_StackOverflowl be hosted at a Venue. Venues will host multiple events and possibly multiple events on a given date. In my new model my thinking was that EventDate is already tied to Event . Therefore, Venue will not need a relationship with EventDate. The 5th and 6th bullets you have listed under U.2) leave me questioning my thinking though. Am I missing something here?

U.3) Is it time to move the link between Item and Band up to Item and Party instead? With the current design I don't see a possibility to sell merchandise not tied to the band as you have brought up.

U.5) I left as per your input rather than making it a discrete Supertype/Subtype Relationship as I don’t see a benefit of having that type of roll up.

Additional Revisions

AR.1) After going through the exercise for FavoriteItem, I feel that Item to Review requires a many-to-many relationship so that is indicated. Necessary?

Data Modeling: Logical Modeling Exercise

Ok here we go for v1.3

I took a few days on this version, going back and forth with my design. Once the logical process is complete, as I want to see if I am on the right track, I will go through in depth what I had learned and the troubles I faced as a beginner going through this process. The big point for this version was it took throwing in some Keys to help see what I was missing in the past. Going through the process of doing a matrix proved to be of great help also. Regardless of anything, if it wasn't for the input given by PerformanceDBA I would still be a lost soul wondering in the dark. Who knows my current design might reaffirm that I still am, but I have learned a lot so I am know I at least have a flashlight in my hand.

At this point in time I admit that I am still confused about identifying and non-identifying relationships. In my model I had to use non-identifying relationships with non nulls just to join the relationships I wanted to model. In reading a lot on the subject there seems to be a lot of disagreement and indecisiveness on the subject so I did what I thought represented the right things in my model. When to force (identifying) and when to be free (non-identifying)? Anyone have inputs?

Data Modeling: Logical Modeling Exercise

EDIT V1.4

Ok took the V1.3 inputs and cleaned things up for this V1.4

Currently working on a V1.5 to include attributes.

Data Modeling: Logical Modeling Exercise

EDIT V1.6

Okay, it has been some time since I have posted on here but the work on this project is still ongoing. I am posting V1.6 now which includes a number of changes from the last posting of V1.4. This version shows the further evolution of the Keys. It still does not include the attributes or any AK's or IE's. I have started working on the physical model and used that to help work through the attributes and to try and shed some light on the problems I am having with defining the AK's and IE's. The next posting of the Logical Model will include these keys and the attributes.

Data Modeling: Logical Modeling Exercise


Method

I will cover specifics, but I will cover one or two Subject Areas completely, not all. You can pick that up and apply it to all subject Areas.

I have not responded to the core Subject Area, because we are still dealing with Identifying Entities. When that is resolved the Reviews, etc will be easier; the Transaction Entities are Dependent on the Identifying Entities.

Direction

D.1) I know that I stated that I need to see the whole model. There is one exception. Historic or Temporal or Audit data (eg. the Edit and stored versions). At this early stage, they can be set aside; to be implemented just before completion of the Logical Model. This is in recognition that (a) they are simple Dependents of some parent (b) the parents need to be modelled in relation to all other tables first, and (c) to exclude unnecessary complications, and thus allow us to concentrate on the relevant field.

  • in particular, you can ignore the tense in the Verb Phrases (every location of a version table would otherwise require Has/Had). Stay with present tense for now, because the focus is modelling, not archiving.

Unresolved

U.1) Optional Parent
That is completely disallowed. Not just by IDEF1X, but by any notion of Integrity. If the FK Reference is defined, then there must be a Parent. To allow optional parents, the FK Reference must be removed (or not implemented). Such a condition would exclude the result from qualfying as a "Relational database", by definition. Eg. Address:Order.

  • Of course, in developed countries, an Order must have an Address for legal or taxation reasons; that is separate to the Standard requirement issue.
    .

U.2) Event
Party::PartyAddress is correct; Address::PartyAdress is correct. Event::Address needs work. Address is an Identifying Reference table; if used, it would be the parent, Event would be the child. I leave it to you to identify/model multiple Events to a location, and Events at one or multiple locations.

  • There may be a Venue involved. Or a EventOccurrence

  • But if it is a generic Event which happens at multiple locations, that does not need an Entity, the Address is already in Order.

U.3) Assuming Catalog is an entry in the traditional sense (JCPenney 2011), a list of items for sale or hire.

  • OrderSaleItem is correct

  • Critical point. Catalog is Dependent, and can exist only in the context of a Band, as an Assset. Fine. That means no merchandise other than Band merchandise in the database. Correct ?

  • I can see how "Evening performance with the Blues Brothers" is an Event that can be ordered, invoiced, and paid. Also reviewed, commented, etc.

  • I can't see how Song fits into that. Are the bands selling albums, songs, or both ?

  • Is there no other Band merchandise: concert/event souvenirs; poster; engraved shot glasses ?

  • Consistent with the naming conventions that you reference, and the rest of the database, Catalog (the cotent) should be named Item (the row). You have already (naturally ?) used that in OrderSaleItem,( as opposed to OrderSaleCatalog.

U.4) Genre

  • No problem with an Item is classified by one-to-many Genres.

  • I think additionally a Genre classifies one-to-many Items. The Relation is one-to-many (which will be resolved as an Associative table when we get to the Physical).

U.5) Favorite
The Cardinality of Item::Favorite is reversed. When you correct that, the Favorite Subject Area will require further modelling.

  • Circular relation or dual paths between the same pair of Entities is a signal of an unresolved model. Generally one is correct and the other is redundant. (There are exceptions, but not here; and when this happens the Verb Phrases differentiate them.)

  • Either Band::Favorite xor Item::Favorite is correct, not both.

  • Item::Favorite seems to be correct, because Band is already identified in Item

  • Likewise, one Favorite Entity for bands and merchandise does not sound solid. Every Identifier in the single Favorite Entity is a Party. It would break when we Normalise, might as well demand that the Identifiers be clarified at this stage. It is either one Entity with some form of differentiation (FavoriteType) which identifies its treatment; or one Favorite for bands and another for merchandise, in which case differentiation is not required, ambiguity is eliminated.

U.6) Business Rules This is probably the only area you are weak in. General response. You have done the tasks separately (all the modelling vs writing BRs). These do not match the model. When you go through the next cycle, take the Business Rules as directives, and modulate them at the same time, as with the Entities, the Relations, and the Verb Phrases.

Question

Q.1) User/Friend
You have the essence of it perfectly. And the Cardinality of the Relations. (Full treatment on this one.) That is correct for Accepted Friend.

  • therefore the tense should be past (go with the majority rows)

  • Requested, and pending Accepted, are the minority. Easily implemented in a IsAccepted Bit or Boolean.

  • Later you may have IsRejected or IsBlocked (that latter should be a separate Entity).

  • Is that what you require ?

Q.2) What is the basis on which a Person is zero-to-many Users ?

Minor Issue

M.1) Singular only.

M.2) Party Has zero-to-many Addresses. I would think they must have one, in order to transact business (but perhaps not for all Users).

M.3) Order May Have zero-to-many Payments. "Requires" means that first Payment has to be inserted at the same time as Order.

  • Likewise, for any mandatory children (one-to-many as opposed to zero-to-many) that first child must be inserted at the same time as the parent. This is done via Transactions in enterprise databases, because Immediate Constraint Checking (not Deferred) is implemented; and the small end of town fight over silly things like Deferred Constraint Checking is "better" and then spend half their life figuring out how not to get caught in the infinite loops they created, which trap them. MySQL does not have any at all, so nothing to worry about for this implementation.

M.4) OrderSaleItem shoulld be OrderItem xor Order should be OrderSale. Depends if you envision OrderPurchase in the future.

Subject Area Example

Readers who are unfamiliar with the Standard for Modelling Relational Databases may find IDEF1X Notation useful.

As stated, I am not providing a finished Data Model, only guidance. This is just one progression of one selected Subject Area. It is not "right" or complete in any way.

  • Your Verb Phrases are excellent. I have provided alternatives for you to consider, they are not "right" or "better". You need to choose an progress them or your own. The goal being the most concise and accurate VP in each case.

  • No suggestion that Person is correct and User is incorrect, that is pending your answer. But I had to use something in the model; since you have modelled them as separate, a counterpoint may be interesting to evaluate.

So go ahead and progress the model, then post again (just edit the question, leaving the header paras, and replacing the rest).

V1.1 and Response

That is certainly a progression.

I have re-numbered the items in pseudo-legal format, including the section headings, so that we can keep the numbering throughout, and keep adding to it. Actually it really eases the SO editing problems as well.

U.3) Would it require an entire rework of the Catalog section or just the identifying relationship that exists with the Band?

  • No. That's the great thing about working at this level, the decisions you make here will be the railroad tracks that the data runs on, as freight, or does not run on (and thus needs alternate transport and heavy lifting to derive, in the form of masses of code or an additional data warehouse). And the decisions here are cheap (modelling time, paper).

  • Right now an Item exists only in the context of a Band. It is Dependent. To allow non-band merchandise, it needs to be Independent. And then the existing super/subtype cluster needs rework.

Attempted a mod to sell both complete albums or song. Either way they would both be in electronic format only available for download. That is why I listed an Album as being comprised of Songs

  • OK. But now you can only sell albums, not songs.

rather then 2 separate entities.

  • Not sure what you mean (you have two separate entities).

  • It appears you have not seen my Subject Area Example. Note that if you open it now, it contains bits that I have added V1.1; I have not changed what was there yesterday, the V1.0 response.

  • Actually that means you should go through my V1.0 Answer again, while viewing the Example.

U.5) ... but how to is not clear to me. What am I missing here?

  • An example of one Entity with differentiation is any of the Supertype/Subtype clusters you have. The Favorite is the Supertype, BandFavourite and ItemFavourite are subtypes; allowing each to reference to Band xor Item respectively.

  • You have modelled ItemFavourite. Now the question is, does the fact of a ItemFavourite imply that the Band is Favourite; or is BandFavourite a discrete fact ? In the example, I have modelled the latter, without the Favourite::ItemFavourite/BandFavourite structure.

Q.1) Yes I would like to have Accepted, Rejected, and Blocked. I am not sure what you are referring to as to how this would change the logical model?

  • No change (I already stated it was pretty complete) to V1.0, but you might need an additional Entity.

  • You need three Bit or Boolean indicators in Friend. That will service these statuses:

  • Requested (but not Accepted)

  • Requested & Accepted
    .

  • But Blocked is not a Friend (or could have been a Friend previously, but not since being Blocked). So either the Entity name has to change to reflect that (no change to the two Relations) xor Blocked has to be a separate Entity. Two separate meanings for the second Relation leads to complexity, therefore I would go with the latter.

With the former, we have additional statuses:

  • Blocked
    .
  • Then the Verb Phrases need change (and I will include the RoleName for clarity), and one of them has a alternate meaning. .
  • (It will be much more clear at the Attribute level Model, that's why we model in pictures, not words; so I have included it.)

Q.2) A person does not have to be a User. They can exist only as a BandMember. Is that what you are asking?

  • No. Why do we need to differentiate Person and User ? What are the separate actions or attributes ? Thus far, I see Person and User as the same Entity; Person is an User with no activity.

  • This is the last item, holding us back from dealing with the core Subject Area.

M.3) I need to read up more on Constraint Checking to make sure I am understanding things.

  • Don't worry about that now; I was giving you reason to keep it simple (the non-compliant SQL databases appear to simplify things but actually they make it more complex). MySQL has none of those capabilities, so you can eliminate consideration of the platform, and just model the Cardinality meaningfully.

M.4) Depends if you envision OrderPurchase in the future. Can you expand as to what you mean here?

  • In the context of the Model. You provide the structures to make SalesOrders (of Items). Therefore Item, Order and OrderItem.

  • But if you provided the structures to track PurchaseOrders as well (to purchase Items as well as office supplies, rent, whatever), then you need to differentiate Sales Orders and Purchase Orders. Therefore:

  • Item

  • OrderSale and OrderSaleItem

  • OrderPurchase and OrderPurchaseItem

Version 1.1

U.2) Event Progressed

  • EventDate looks good. I would define the Relation as Event Was Perfromed On EvenDate.

  • Whereas ItemGenre is perfect, Event::Venue Needs work. This is a mistake you make consistently, so an explanation is called for.

  • You have modelled Venue correctly, it is Independent and does exist outside the context of Event. But Event May Be [Held] At zero-to-many [Independent] Venues is not possible.

  • Events are held at many Venues, and Venues host many Events. If that was all, since this is the Logical Level, you can draw a many-to-many Relation, and you are done. At the Physical level, that Relation is resolved by implementing an Associative Table, of which the PK is the two parent PKs, and there is no data. (Enemy is a good example.)

  • But if there is data (eg. you need to track the date or number of attendees or whatever), then it is not an Associative Table, it is another Entity. A Thing that Takes Place between Event and Venue.

  • EventDate is a good candidate. We already have that, and the date. Just add Venue and stir. I would call the Thing that Takes Place between Event and Venue a Performance.

  • Likewise, EventAddress has progressed but is not complete.

  • Do Events have Addresses or Venues have Addresses ? (model it, no need for words)

  • If Venue: do you need all the historic Addresses for the Venue (like Party), or just the current one (like Order) ?

M.5) SubGenre. Can you explain why SubGenre is (a) Independent and (b) the Relation is Non-Identifying.

M.6) Item Is zero-to-many Favourites. Therefore: Item Is a Favourite of zero-to-many Users. Likewise, Each User Chooses zero-to-many Favourites. Therefore Each User Chooses zero-to-many Favourite Items.

V1.2 and Response

Great Progress.

U.2) Event Further Progressed

Going by your Edit as well as the new Requirements, some yes and some no. All the other Subject Areas of the Data Model are pretty much complete (for Logical), this one area is confused, not nearly as resolved. Partly because of the added Requirements (no complaint, that happens in real life; it is about how you handle it).

The main point I will make here is that the Data Model should always model the real world, as opposed to only the business Requirement. That (a) insulates the DM from the effect of change and (b) provides a solid platform for added Requirements. That does not mean you have to model the whole real world, but the parts of it that you do model must reflect reality and not be squished up to fill just the Requirement.

Second, there is lack of clarity about the distinctions between Event, Band-Event, Performance, etc. Right now an Event is a Party-Band-Item-Event. That's fine, but it does not work for the new style Event per Requirement.

Third, you have a good handle on Address re Party and Order, but not re Venue.

  • Since you are accepting the Standard-compliant model and therefore the treatment, Address is a Reference table.

  • It is Independent (square corners)

  • Actually, you can place Address and everything above it on page one; making this part of the model page two, and have Address only on this page.

  • Correctly modelled: A Party has a history of Addresses. They must have at least one current { IsBilling | IsShipping | IsPhysical } Address, based on whatever activity is being executed.

  • Correctly modelled: An Order has one IsBilling Address (if you need IsShipping, you need to add a separate Relation).

  • Address is not a child of Venue (also Independent, correct). I do not think a Venue is located in zero-to-many Addresses. (Maybe that is the old Cardinality-reversed bug, but I am not sure, due to the other confusion re Event and Venue.)

  • Actually Address::Order is suspicious. (Q.3) Do you want Order to reference any valid Address, or a specific address for the Party executing the Order ?

  • Back to Event. Accepting EventDate as declared. That's fine but then Reviews etc, apply to the generic concert and not the single concert which they performed on mushrooms. Go for V1.3.

  • Your terminology re Event, etc is consistent with the Requirement, etc. but it does not support the Requirement as stated.

  • So let us start using "Event" the way it is used in the real world, and model it that way. What we have been calling "Event", the Party-Band-Item, is actually a Performance. And not a generic one that is scheduled, but a single one at a specific Venue.

  • That is either what you meant with EventDate, or EventDate resolves into Performance.

If you do not mind, I will avoid typing one thousand words, and give you a picture. Subject Area Example V1.2

  • Notice that the multiple Bands per Event is resolved.

  • And the Verb Phrases are straight from heaven. An Address hosted multiple Venues, each of which catered multiple Events, each of which is multiple Performances, each of which is one Party-Band-Item.

U.3) Is it time to move the link between Item and Band up to Item and Party instead? With the current design I don't see a possibility to sell merchandise not tied to the band as you have brought up.

  • First, we need to use Relational terminology, not because I am a pedant, but because the real gurus say it really helps to make the transition to the Relational world.

  • Second, we cannot accomplish that by "moving the Relation".

  • You have to model non-Band merchandise: how you are going to sell it; track it; get paid for it. Whether you want Reviews and Responses, etc. I do not see what Party has to do with it, and right now we are selling Band-Items, not Party-Items. Consider the Referential Integrity issues.

Version 1.2

AR.1) After going through the exercise for FavoriteItem, I feel that Item to Review requires a many-to-many relationship so that is indicated. Necessary?

  • In V1.1, An Item had many Reviews, and a Review was about one Item. A Person generated many Reviews (one per Item). That is logical.

  • A Review is about many Items is not reasonable.

  • If anything, now that FavouriteItem/FavouriteBand is resolved, Review needs likewise resolution and distinction: do we need to differentiate BandReview from ItemReview; does a good/bad ItemReview indicate a good/bad BandReview or are they discrete ?

  • a Review (as it stands) cannot be about either a Band or an Item. That means two Foreign Keys, and one of the will Null, and Null FKs are not allowed. Item and Band are alreay differentiated, and that differentiation is mature.

  • ItemReviews can be summarised, etc, but that is a different story.

U.7) That leaves us with a new issue to resolve. If a Review can be about a Band or Album or Song or Performance, how do we ensure that Referential Integrity. We do not need an AlbumReview to reference a SongReview, etc. Model it.

R.5) The model currently provides Genre at the Item level, that means Album and Song (Merchandise can be disallowed via a CHECK Constraint). Not Band. That may be enough, given that (a) bands change over time, (b) that kind of classification at the Item level is more precise, and (c) Band Genre can be easily derived from their Albums or Songs.

  • If you need separate Band Genres, you need to add that.

  • What about Event Genre ? If you need it, I think it will be one Genre per Event.

  • Keep in mind that tables like Venue and Genre are serious search criteria in a major database. Vectors for analysis.

  • The Data Warehouse boys need to add this in as Dimensions to their Facts; in a properly modelled Database, they already exist as Dimensions to Facts. Show me all the Venues with "Folk Music" Events scheduled that attracted more than 10,000 People is dead easy.
    .

  • Discussion Point. Not saying the above is incorrect. What I have found in both Databases and iTunes is, precision counts. Why have laissez faire Genre::Several things when you can have Genre ::Specific Thing. If you had Genre::Song only, and Song has one Genre only, then Album and Band are precise roll-ups. The way we have it now, it depends on the music knowledge of the data entry person, and Genre::Thing is many, so it is loose. Genre::Song is tight.

R.6) members can show that they will be attending the Event is not modelled. Also clarify interest vs booking vs attendance.

R.8) Is not modelled.

M.3) The issue is closed, but the Verb Phrase remains unchanged.

M.7) Logical Model vis-a-vis Associative tables. Now that that issue is closed, remove any Associative tables for the Logical model; any remaining tables (between two parents) will contain data. That means, go through all the Dependent tables and remove any that do not have data. Thus V1.3 should be less cluttered.

M.8) Item is OrderItem.

M.9) Now that Party-Person-User is resolved. An Exclusive Subtype structure requires a Discriminator, and the Constrainst will be used to enforce Integrity. Where there are many, PartyType is the way to go. But for just two, a column IsBand or IsPerson is adequate.

M.10) You have corrected the cardinality-reversed bug, but some Verb Phrases are still going the wrong way.

27 Jan 11

Actually, I think a lot of these issues would be clearer if we move into the Logical Key/Attribute level (rather than just Entity Relation level). And it is high time we did. For example:

Q.3) Order:Address is suspicious. The constraint is not quite correct because that would allow the order to have any Address, not an Address that is specific to the Party executing the order.

But since you are MySQL, which has no Referential Integrity, you may not be aware of how it is done in real SQL, so I will provide the FK Definitions, which happen to be RI Constraints as well. It is kind of unfair to expect you to understand my terse statements, which are based in the RM, Normalisation and supported by SQL, when you do not have SQL.

  • In order for the two constraints to be true, since Party must be the same in each Constraint (there is only one Order.PartyId), only the subset of PartyAddress which belongs to PartyId, will be allowed.

Address Qualification Example

Continued in Part II ...


... Part II V1.3 and Response

All the issues are either minor, or relate to the new step you are learning.

Identifiers vs Id columns

I am not going to give you a full rundown here, as I have posted at least 20 times about how Id columns cripple a database and rob it of Relational Power. I will deal with the issue in the context of this question only.

  • Here is an example, check the question in detail first. Note that Mark is quite capable, but completely stuck. Then read my answer, then look at the Data Model. (Please do that now, it provides context)

  • The idea is either model the data, as data, which we are doing, and you will end up with a database, xor stick Id columns on everything that moves, which obstructs the modelling exercise and Normalisation, and you will end up with a bunch of spreadsheets "linked" to each other with massive duplication and no performance.

  • Therefore, remove all columns of the form [Table]Id from all tables (leave the Migrated Keys alone, they are correct), except the Following tables (these are the major Identifiers, reflected throughout the database. Note how ERwin will correct all child, grandchild, etc. tables:
    Party
    Address
    Item

Relational/IDEF1X Identifiers

You are learning about Identifiers. These are the Natural Keys. Either Keys that the user uses, or Keys that have been Migrated from a parent to a child as Foreign Keys. These are therefore not only identifying the Relation but also Identifying the child. Your last name tells me not just about you, but also about your father, and also that you are your father's son. Want to make that unique ? No problem, just add a first name.

You have been reading my answers, looking at my Data Models, and then adding Identifiers to your model. It is *much easier than that. ERwin (since it implements IDEF1X) does that for you.

  • Take Party, Band and Person. The Identifier for Party is PartyId
    (ok, that is a Surrogate Key, not a Natural key; but the Natural Key Lastname, FirstName,BirthDate, etc. is very long, if we use that as the Primary Key, it would be Migrated to the children, grandchildren, great-grandchildren, which is not desirable, so we add a short Surrogate Key, and make that the Primary Key)

  • When you create the subtypes in ERwin, and indicate the Relation, it will automatically place PartyId in Band and Person, as the PK; it will mark it as "(FK)". (Note: I use bold font to denote (FK) in my models.)

  • That's it, you are done. Party::Band is 1:0-1, Band Primary Key is PartyId. Because it is a Subtype, ERwin will ensure the Relation is Identifying, and therefore the parent PK ends up in the child PK, and the Dependent child has round corners.

  • If subtypes were not involved, it would be the same, except the Relation may not be 1::0-1, it may be 1::1-n. In which case, you need to add another element to make it unique, such as fFirstName, or SequenceNo

  • And you have to indicate to ERwin that you want an Identifying Relation. (If you don't then it is a plain FK, and the columns will be below the line; the table will be Independent,; the corners square).

  • And if at some point you decide to use those FK columns to form the PK, you simply click on the Relation and change it from Non-identifying to Identifying; the columns will be moved above the line; the corners will be round.

Role

  • Now for the next step. We know that Band::Party is 1::1; that Band is a child of Party; that Band.PartyId is the perfect PK (no Id column is required). Same for Person. But they are silly names, or put another way, Band is actually a different Role to Person, and they are both a Party. So we want to identify the Role clearly.

  • In Band, we would like to call PartyId, BandId, to reflect its Role. Edit the Relation, between the subtype symbol and the child, not the table. In the dialogue, fill in the RoleName as BandId. That's it. You are done.

  • Thus the following change from ... to:

    FloorItem.ItemId       FloorItemId
    BandItem.ItemId        BandItemId
    Consequently ...
    Other.BandItemId       OtherId
    Album.BandItemId       AlbumId
    Song.BandItemId        SongId
    Performance.BandItemId PerformanceId

  • Removing all the [Table]Id columns will leave the following tables without a PK. For now, add a Name column as the PK. You can tell me later what the user would like for a natural key, an Identifier for these tables: Event
    Genre

  • PartyAddress is an example (ie. modelled correctly) of what all I have discussed above. It had no PartyAddressId. PartyId and AddressId together form the PK. Both Relations are Identifying.

Identifying vs Non-identifying Relations

In reading a lot on the subject there seems to be a lot of disagreement and indecisiveness on the subject

  • Yes. Unfortunately, anyone with a keyboard and a modem can "publish" these days. People post opinions as facts; they post nonsense about subjects they are clueless about. This confuses people who are trying to learn.

  • It is science, not magic or a black art, not opinion.

  • When learning, read only definitions, and listen only to people who clearly transfer the science (not to anyone who is confused or treats the science like it is an art or that it is subject to opinion). We are learning facts, laws of physics, not opinions about the laws; the laws work the same for everyone, across the planet. You can't learn from someone who thinks a fact is an opinion.

Let's take it from the top:

  1. The Relation is the defining criteria (renders the child Independent/Dependent), not the other way round.

  2. A Relation is always an FK in the child, of the parent PK.

  3. In an Identifying Relation, that FK is the PK (or the first part of the PK, where the PK is a composite key). And the child is a Dependent table.

  4. In a Non-Identifying Relation that FK is a non-PK column, and the child is Independent (it may be forced into Dependence by some other Relation).

  5. All Subtypes have Identifying Relations from the Supertype. Otherwise they would not be Subtypes, they would be Independent of the Supertype.

  6. All 1:0-1 Relations are Identifying.

so I did what I thought represented the right things in my model.

  • Which may be why you ended up adding [Table]Id keys.

When to force (identifying) and when to be free (non-identifying)?

  • Never force anything re modelling (Database and Function), especially re data. It is the uncontrollable that we want to administer, manage, mould, control, etc. But to do that effectively, we have to understand it first. We cannot understand anything when we force it. Forcing it deprives it from exposing itself, and deprives us from noticing the subtleties and flavours (because we "know" it). Let it be free, but constrained, like a horse in a paddock, not a prisoner in a stable.

That is why the act of sticking Id columns on every spreadsheet prevents understanding of the data, and therefore any modelling of it.

  • As per above, it is the Relation that is Identifying or not; not whether the Entity is Independent or not, that is a consequence.

Do it Relational/IDEF1X/ERwin style:

  1. You want an Entity, draw an Entity. Name it. Unless it is the first entity on the canvas, do not add keys.

  2. Now consider its Relations. How do the Entities you have already modelled relate to this new Entity ? Draw that Relation (Relations are drawn Parent-to-child).

  3. Of course, it defaults to Identifying, because most Relations in a (wait for it) Relational Database are Identifying. The parent PK is placed in the child PK.

  4. If you think, no, no, I want this to be Independent, then you better have a good reason. The key question here is, does this entity exist completely on its own, does it exist outside the context of other Independent entities ? AFAIC, there are five in your model:
    Address
    Party
    Item
    Event
    Genre

Every other entity exists only within the context of one of these Independent Entities. Thus you drew Identifying Relations, and thus they are all Dependent.

  • Recall, we had Item as Independent earlier; then we had a new form of Item; which made the old Item, BandItem; which made BandItem Dependent on the new Item.

  • We had a great Identifier in ItemId, which was carried not only in the (then) Item cluster but throughout, in OrderItem, Review, etc.

  • We changed the context of Item (created a higher order Item), and due to the Identifying Relations, that was then Migrated throughout, and the new BandItem was Migrated in its context.

  • The new ItemId continues to be a great Identifier. BandItemId is exactly ItemId, but plays a particular Role, it is a subset/subtype of ItemId.

  1. So if it is a true Independent entity, go ahead and give it a new PK.
  • But at this stage, not an Id column, something meaningful that identifies the entity. Event.Name, Customer.Code. No human being identifies a Customer as number 123456, no, they think of "IBM", "3M", etc. Later on, as the model progresses, we will make sure we have really good Keys; right now with the new Entity, we care that it has an Identifier.

  • Exception. For Address, Party, Item, you knew at V1.0 you were going to have millions, thousands, thousands of them; that these were major Identifiers that would be MIgrated throughout the database; that the true PK was very long; and that you needed a short Surrogate Key as the PK; so you set that up from the outset, and you got no argument from me.

    • If you are ready for Domains, then INT, INTor SMALLINT, SMALLINT.

    • Otherwise Name, CHAR(30).

  1. The next step is to finish the PK on the new entity. If the cardinality from the parent is 1::n, it already has the PK of the Parent, just add an element to make the PK unique. Let's look at Order. It already has PartId, so OrderNo can be within PartyId. Just change the order of the PK columns to (1) PartyId, (2) OrderNo.

  2. The only time we do a little bit of forcing, is when the number of columns forming the PK becomes too many, or the total width of the PK becomes too wide, to Migrate as an FK into the children. Then, and only then, we create an additional Surrogate Key of the form [Table]Id (they are always additional, we can't lose the real PK or the uniqueness, because it supports other requirements).

  • AFAIC, that magic number is seven (magic no for a many things, actually; even this item appears as number seven), and that maximum width is 30 bytes. That was done from the outset with Address (already highly optimised), Party (otherwise 64 bytes), Item (over 30 bytes).

  • If we are going to break the intrinsic Relational power, we need the pain of carrying that Relational power itself to be really bad, and for no other reason. Not even approaching that in your model.

Review Cluster

You've done a very good job, so consider this as the next progression. Basically you have two options, and of course we are comparing/relating this to the Item cluster.

  1. Going with the Review cluster as is. We need a SongReview and an AlbumReview. And get rid of ItemReview (that encapsulates all Items, which means we are doubling up). I thought we were excluding Reviews for non-Band Items.

  2. Allow the non-BandReview to be about any BandItem, eg. change the ItemReview FK from Item to BandItem. That encapsulates all BandItems in ItemReview. Get rid of PerformanceReview.

  • Sure, you may not want BandItem.Other to be reviewed; that can be constrained by other means. But if you want to be strict, then you need (1).

Colour

It is great that you have adopted my colour scheme.

  • The meaning, the visual relevance, does not show up in a tiny model (most of my models on SO); it only shows up on larger models such as yours.

  • Because you have done such a great job with V1.3, I have created an ERD for you. Actually, the IDEF1X Notation document is worth reading again, it is very condensed, and I am told that people get more value out of it when they read it after modelling something. What I need to know is, whether the Natural Hierarchy and the Colour do anything for you.

  • That's just finishing off the Entity level Logical.

You can continue with the Logical, Key level (the only Attributes are FKs, and we know what they are). But feel free to start identifying Attributes (in which case, show the Attribute Level).

Optional Column

U.1) An Optional Parent has crept back into the model. PartyAddress is shipped for Order is not Nullable.

  • If you intended to model that the shipping address is optional, then you need an OrderShipAddress Entity, which is a child of Order, and the cardinality is 1::0-1.

  • Nulls (optional columns) are like a cancer collecting all over the body, Nullable FKs (optional parent) are cancer of the throat in an orphan before the age of five.
    .

  • That's the basic method to handle any optional column, not limited to one that is an FK (optional parent) such as here.

Minor

M.11) These were correct in V1.2
Review::Comment is 1::0-1
BandMember:: Comment is 1:0-n

M.12) Event::Person is n::n (and the columns will not show at the logical level)

V1.4 and Response

Very good progress. Are you happy with the Identifiers, the Keys ?

U.8) (If you do this first, the remainder will follow easily.) ERwin Limitation. Congratulations, you have produced a model that has reached the limitations of ERwin's capability in Logical modelling. To be clear, this is not really a limit, in that it gets resolved in the Physical Model, and of course it is not a limitation in IDEF1X or Relational Databases. But right now, at the Logical, it interferes with your learning and progress.

  • In BandItem we want the PK to be (BandItemId, BandId). But ERwin won't allow it because it says a Subtype PK must be the Supertype PK and nothing but. Actually, as long as the Supertype PK is the leading Identifier, another Identifying Relation is acceptable. To work around this:

  • drop the subtype symbol

  • create two Identifying Relations Item::FloorItem and Item::BandItem

  • The Relations that we had to make Non-identifying can now be Identifying.

  • ERwin will now resolve the Migrated PKs as FKs, without duplication.

  • Yes, chuck the Roles back in.

U.9) Now I understand what you are trying to do with the Review cluster, so first, let me say that you have modelled it correctly, all the way down to Rating.

  • But there is a basic problem at Review itself. With a PK of ReviewerId, one Reviewer will only be allowed one review. Of course, you want one Review per Reviewer per Band/BandItem, but that is hidden further down in the subtype. Basically the use of Supertype-Subtype here is too restrictive. Good to understand, in that early stage, but now we need to move beyond it.
  • Instead of the Review cluster, create two Review tables, a BandReview and ItemReview.
  • The Relations will be Person::BandReview and Band::BandReview, and Person::ItemReview and BandItem::ItemReview
  • Then each of them will have children %Rating, %Comment, %CommentRating.

M.13) Order::OrderShipAddress is 1::0-1, correct. PartyAddress::Order is 1::0-n, correct Therefore the Shipping Address should be PartyAddress::OrderShipAddress 1::0-n

M.14) Payment currently allows only one payment per Order, which may be what you require, but the relation is 1::1-n. If you need more, then add a SequenceNo to the PK.

M.15) Genre is fine. But SubGenre needs something in the PK to allow more than one Genre. I would now change Genre.Name to Genre.Genre, and add SubGenre to the SubGenre PK.

  • that will fix Event.GenreId as well.

M.16) Venue needs a Name for a PK for now. If you are ready for better keys, then ShortName, and Name moves down as an attribute.

Q.4) Confirming. Since we have an Identifying Relation in Order, and the PK (PartyId, OrderNo) therefore OrderNo is a sequential number within PartyId, correct ?

Go for V1.5. Include some Attributes. The best way to identify them is to either start a Function Model (and now work the Data Model side-by-side with it) or at least work through all the functions for all the screens.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜