Star-schema: Separate dimensions for clients and non-clients or shared dimension for attendants?
I'm new to modeling star schemas, fresh from reading the Data Warehouse Toolkit.
I have a business process that has clients and non-clients calling into conference calls with some of our employees.
My fact table, call it "Audience", will contain a measure of how long an attending person was connected to the call, and the cost of this person's connection to the call. The grain is "individual connection to the conference call".
Should I use my conformed Client dimension and create a non-client dimension (for the callers that are not yet clients) this way (omitting dimensions that are not part of this questions):
Or would it be OK/better to have a non-conformed Attending dimension related to the 开发者_高级运维conformed Client dimension in this manner:
Or is there a better/standard mechanism to model business processes like this one?
Edit:
What about using model 2 above, but creating a view on top of the client dimension table and the attending dimension to make it look like it is only one dimension?
Is that an acceptable alternative to Damir's answer below?
There is no need to split clients into two tables (dimensions). Simply put all clients, active and prospects into the same dimension table. You could then introduce an IsActive attribute (column) to distinguish between paying clients and prospects. Sooner or later you will use a data mining tool to learn more about clients and what distinguishes people who are willing to pay for your service from those who are not. In order for the algorithm to work, you have to supply data for both groups of people -- those who are paying and those who are not paying. To summarize, prospects belong to the same table as paying clients.
With this, you can use your model No 1. Make sure that measures in the fact table make sense. For example if a call_id =123 had 10 people participating, then
sum(cost_of_connection)
from factAudience
where call_id = 123;
should return the total cost of the call, not something meaningless -- like 10x the real cost.
EDIT
A "paying client" and a "prospect client" are both a type of a client, therefore belong to the same dimension table -- dimClient. Somewhere in the DW there is a factSale (or similar) with FK to the dimSale. Even if you do not have a column in dimClient to differentiate between paying and prospects -- you can still get paying clients by joining factSale and dimClient.
"Who is a customer?" is a common debate when introducing a DW in an organization. In order to be able to analyze client acquisition, retention, conversion, etc., prospects have the same treatment as paying customers -- at least in the DW. Keep in mind that acquiring and creating new customers is on the top of the list for (almost) any CEO.
I would go for the second: it models the attendees in their own, dedicated dimension, whilst allowing you to expose their client-ness (or otherwise) via an attribute in that dimension, which is probably the way you would want to drill down in real life ("show me all attendees" followed by "and now which of those are clients").
In your client dimension I would populate the client_id for all attendees, matching to an "unknown" element where the attendee is not a client.
There's a nice discussion about this here:
http://crpit.com/confpapers/CRPITV75Riazati.pdf
It makes little difference. The second version is possibly more correct, but does your olap system support this?
The second one looks like "snowflake schema" to me. Look into snowflake schema, starting with the wikipedia article. You'll see several comparisons between star and snowflake.
精彩评论