Doesn't anyone use a Customer Fact?
I am in the process of designing a new customer fact and dimensions for my warehouse. In my search for good example models I noticed something odd. No one seems to have a customer-centric fact. Every example I found has a transaction event such as a sale or order as the central fact with customers as a dimension. This raises a question for me.
Am I doing something seriously wrong by wanting a customer fact? The goal is to enable analysis of customer behavior such as order frequency, total spend, acquisition cost, distinct, product count,...etc These questions naturally imply a fact to me not a dimension. I already have an order fact that is great for order-centric queries but is not good for customer-centric queries.
To give you a little more detail the Customer Fact will likely have the following measures and dimensions:
Measures:
- Count of Customers
- Distinct Product Count
- Completed Order Count
- Total Revenue
- Total Cost
- Count of Coupons Received
- Count of Coupons Redeemed
- Cost of Coupons开发者_StackOverflow Redeemed
Dimensions:
- Order Delivery Date
- Order Delivery Time
- Order Delivery Geography
- Acquisition Source
- Order Type
- Coupon Type
The above seems pretty natural to me but I am concerned I am missing an obvious no-no by taking a customer-centric approach in this new cube.
We have customer facts. A lot of the time they are factless fact tables which just link several dimensions.
Sounds like a lot of your facts are derived or summary. Grain will still be important. If you say order count is that MTD (and what date) or for all time, etc.
I don't think there is anything wrong with that, but I think because this is derived data, most people would put it in a "data mart" or whatever the best unambiguous term for a subset for analysis.
I agree that modeling it the same way is perfectly valid. The only thing you have to watch out for is the same with all derived data, it needs to remain consistent.
Your customer will have a dimension (conformed, since it is shared between models) and then a CustomerStats fact table or whatever, with every fact at that grain which shares all those dimensions.
The reason so many systems are order-centric instead of customer-centric is that how you identify the customer changes so frequently over time: previously treating a business as the customer evolves into treating individual employees as the customers or vice versa, or a customer will change/split/merge addresses, or a business changes its name and we want to consolidate (or segregate) old and new performance totals, or a shipping address and billing address now have to be expanded to include a support address, or operators forget or mistake one address purpose for another, or a customer wants to use a special shipping address only temporarily, or etc etc.
This is addressed in more detail here.
I may have misunderstood your question, but let's see what can be learned about customer behaviour from the factOrder only, the old-fashioned way.
Assuming that grain of the factOrder is one line on the order and that there is OrderID as a degenerate dimension.
-- Number of customers who ordered something at least once
select
count(distinct CustomerKey) as PayingCustomers
from factOrder ;
.
-- Number of orders and sales per customer
select
CustomerKey
, count(distinct OrderID) as NumberOfOrders
, sum(ExtendedPrice) as Total
from factOrder
group by CustomerKey ;
.
-- Histogram (x = NumberOfOrders, y = People, Amount)
with
orders_per_customer as (
select
CustomerKey
, count(distinct OrderID) as cnt
, sum(ExtendedPrice) as Total
from factOrder
group by CustomerKey
)
select
cnt as NumberOfOrders
, count(1) as People
, sum(Total) as Amount
from orders_per_customer
group by cnt
order by cnt asc ;
.
-- Distinct products ordered by customer
select
CustomerKey
, count(distinct ProductKey) as DistinctProductsOrdered
from factOrder
group by CustomerKey ;
.
-- Histogram (x = NumberOfDistinctProducts, y = People)
with
products_per_customer as (
select
CustomerKey
, count(distinct ProductKey) as cnt
from factOrder
group by CustomerKey
)
select
cnt as NumberOfDistinctProducts
, count(1) as People
from products_per_customer
group by cnt
order by cnt asc ;
Measures such as order frequency
, total spend
, acquisition cost
, distinct product count
are actually derived from Orders
as a fact table, with Customers as a dimension. Aggregating per customer could just as easily been aggregating per product or per geographic location.
As Cade Roux has suggested, you can build a customers aggregated table, which should be detached from the other fact tables, however that would purely be a performance decision. You retain maximum flexibility by constructing Customers
as a dimension of Orders
.
精彩评论