开发者

Doesn't anyone use a Customer Fact?

I am in the process of designing a new customer fact and dimensions for my warehouse. In my search for good example models I noticed something odd. No one seems to have a customer-centric fact. Every example I found has a transaction event such as a sale or order as the central fact with customers as a dimension. This raises a question for me.

Am I doing something seriously wrong by wanting a customer fact? The goal is to enable analysis of customer behavior such as order frequency, total spend, acquisition cost, distinct, product count,...etc These questions naturally imply a fact to me not a dimension. I already have an order fact that is great for order-centric queries but is not good for customer-centric queries.

To give you a little more detail the Customer Fact will likely have the following measures and dimensions:

Measures:

  • Count of Customers
  • Distinct Product Count
  • Completed Order Count
  • Total Revenue
  • Total Cost
  • Count of Coupons Received
  • Count of Coupons Redeemed
  • Cost of Coupons开发者_StackOverflow Redeemed

Dimensions:

  • Order Delivery Date
  • Order Delivery Time
  • Order Delivery Geography
  • Acquisition Source
  • Order Type
  • Coupon Type

The above seems pretty natural to me but I am concerned I am missing an obvious no-no by taking a customer-centric approach in this new cube.


We have customer facts. A lot of the time they are factless fact tables which just link several dimensions.

Sounds like a lot of your facts are derived or summary. Grain will still be important. If you say order count is that MTD (and what date) or for all time, etc.

I don't think there is anything wrong with that, but I think because this is derived data, most people would put it in a "data mart" or whatever the best unambiguous term for a subset for analysis.

I agree that modeling it the same way is perfectly valid. The only thing you have to watch out for is the same with all derived data, it needs to remain consistent.

Your customer will have a dimension (conformed, since it is shared between models) and then a CustomerStats fact table or whatever, with every fact at that grain which shares all those dimensions.


The reason so many systems are order-centric instead of customer-centric is that how you identify the customer changes so frequently over time: previously treating a business as the customer evolves into treating individual employees as the customers or vice versa, or a customer will change/split/merge addresses, or a business changes its name and we want to consolidate (or segregate) old and new performance totals, or a shipping address and billing address now have to be expanded to include a support address, or operators forget or mistake one address purpose for another, or a customer wants to use a special shipping address only temporarily, or etc etc.

This is addressed in more detail here.


I may have misunderstood your question, but let's see what can be learned about customer behaviour from the factOrder only, the old-fashioned way.

Assuming that grain of the factOrder is one line on the order and that there is OrderID as a degenerate dimension.

-- Number of customers who ordered something at least once
select
    count(distinct CustomerKey) as PayingCustomers
from factOrder ;

.

-- Number of orders and sales per customer
select 
      CustomerKey
    , count(distinct OrderID) as NumberOfOrders
    , sum(ExtendedPrice)      as Total
from factOrder
group by CustomerKey ;

.

-- Histogram (x = NumberOfOrders, y = People, Amount)
with
orders_per_customer as (
    select 
      CustomerKey
    , count(distinct OrderID) as cnt
    , sum(ExtendedPrice)      as Total
    from factOrder
    group by CustomerKey
)
select
      cnt        as NumberOfOrders
    , count(1)   as People
    , sum(Total) as Amount
from orders_per_customer
group by cnt
order by cnt asc ;

.

-- Distinct products ordered by customer
select 
      CustomerKey
    , count(distinct ProductKey) as DistinctProductsOrdered
from factOrder
group by CustomerKey ;

.

-- Histogram (x = NumberOfDistinctProducts, y = People)
with
products_per_customer as (
    select 
      CustomerKey
    , count(distinct ProductKey)  as cnt
    from factOrder
    group by CustomerKey
)
select
      cnt       as NumberOfDistinctProducts
    , count(1)  as People
from products_per_customer
group by cnt
order by cnt asc ;


Measures such as order frequency, total spend, acquisition cost, distinct product count are actually derived from Orders as a fact table, with Customers as a dimension. Aggregating per customer could just as easily been aggregating per product or per geographic location.

As Cade Roux has suggested, you can build a customers aggregated table, which should be detached from the other fact tables, however that would purely be a performance decision. You retain maximum flexibility by constructing Customers as a dimension of Orders.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜