开发者

Data Warehousing Design Question

I'm developing a data warehouse and have come up against a problem I'm not sure how to fix. The current schema is defined below:

DimInstructor <- Dimension table for instructors DimStudent <- Dimension table for students

I want to implement a scenario whereby if details of an instructor change in my OLTP database, I want to add a new record in the DimInstructor table for historical reporting reasons.

Now, I'm wanting to create a lesson dimension table called DimLesson. In DimLesson I want to create a reference to the instructor.

The DimInstructor table contains:

InstructorDW开发者_Python百科ID <- Identity field when entered into DW InstructorID <- The instructor ID that has come from the OLTP database

Now, I can't make InstructorID a primary key because it isn't guaranteed to be unique (if the instructor changes their name, there will be 2 records in the DW with the same InstructorID value).

So my question is, how do I reference the instructor from DimLesson? Do I use the InstructorDWID? If so, should I have 2 entries for an instructor in DimInstructor, it would make queries more complicated when I'm wanting to look at all lessons by a specific instructor.

Any help would be appreciated!


What you are describing here is usually called type 2 dimension. Kimball data warehouse books have whole sections on type 2 dimensions and ETL for the type -- do read.

The first thing to understand is the difference between the primary key and the business key. The primary key uniquely identifies a row in the table, while the business key uniquely identifies an entity that the table describes, like an instructor. For example, if an instructor changes name, the dimInstructor table may look something like:

InstructorKey  InstructorBusinessKey  FirstName LastName  row_ValidFrom row_ValidTo   row_Status
  1234           jane_doe_7211           Jane     Doe       2000-03-11   2010-08-12     expired
  7268           jane_doe_7211           Jane     Smith     2010-08-12   3000-01-01     current

Now, providing that the dimLesson is proper design for your business model (as opposed to having some kind of fact) the dimLesson would have a column called InstructorKey. During ETL process, when delivering the new row (7258) to the dimInstructor table, replace all references to row 1234 in the dimLesson with 7268 .


Paul,

There are multiple ways you can handle this. You can use an effective date/inactive date, sequence number or a version number to differentiate the records with the same InstructorID.

The DIM that captures all relevant details would be like..

create table DIM_INSTRUCTOR(
  instr_guid number, --populated through a sequence     -----Composite pk-Part1
  istr_oid   number, --direct id from the OLTP system   -----cmposite  pk-part2
  instr_name number,
  other_attr varchar2(25),
  eff_date   date,
  expiration_date date
);

instr_guid is directly generated from a sequence and is independent of the OLTP system.

This would let you capture all the details for a given instructor. You can use just the instr_guid as the foreign key to the fact table, but including both of them (instr_guid,instr_guid) would increase the ease of querying .. which is one of the goals of Datawarehousing.

Useful Links:

http://en.wikipedia.org/wiki/Surrogate_key http://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2


Use a guid/uuid as the primary key or a combination of columns

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜