开发者

Segment purchases based on new vs returning

I'm trying to write a query that can select a particular date and count how many of those customers have placed orders previously and how many are new. For simplicity, 开发者_运维技巧here is the table layout:

id (auto) | cust_id | purchase_date
-----------------------------------  
1         | 1       | 2010-11-15  
2         | 2       | 2010-11-15  
3         | 3       | 2010-11-14  
4         | 1       | 2010-11-13  
5         | 3       | 2010-11-12  

I was trying to select orders by a date and then join any previous orders on the same user_id from previous dates, then count how many had orders, vs how many didnt. This was my failed attempt:

SELECT SUM( 
CASE WHEN id IS NULL 
THEN 1 
ELSE 0 
END ) AS new, SUM( 
CASE WHEN id IS NOT NULL 
THEN 1 
ELSE 0 
END ) AS returning
FROM (

SELECT o1 . * 
FROM orders AS o
LEFT JOIN orders AS o1 ON ( o1.user_id = o.user_id
AND DATE( o1.created ) =  "2010-11-15" ) 
WHERE DATE( o.created ) <  "2010-11-15"
GROUP BY o.user_id
) AS t


Given a reference data (2010-11-15), then we are interested in the number of distinct customers who placed an order on that date (A), and we are interested in how many of those have placed an order previously (B), and how many did not (C). And clearly, A = B + C.

Q1: Count of orders placed on reference date

SELECT COUNT(DISTINCT Cust_ID)
  FROM Orders
 WHERE Purchase_Date = '2010-11-15';

Q2: List of customers placing order on reference date

SELECT DISTINCT Cust_ID
  FROM Orders
 WHERE Purchase_Date = '2010-11-15';

Q3: List of customers who placed an order on reference date who had ordered before

SELECT DISTINCT o1.Cust_ID
  FROM Orders AS o1
  JOIN (SELECT DISTINCT o2.Cust_ID
          FROM Orders AS o2
         WHERE o2.Purchase_Date = '2010-11-15') AS c1
    ON o1.Cust_ID = c1.Cust_ID
 WHERE o1.Purchase_Date < '2010-11-15';

Q4: Count of customers who placed an order on reference data who had ordered before

SELECT COUNT(DISTINCT o1.Cust_ID)
  FROM Orders AS o1
  JOIN (SELECT DISTINCT o2.Cust_ID
          FROM Orders AS o2
         WHERE o2.Purchase_Date = '2010-11-15') AS c1
    ON o1.Cust_ID = c1.Cust_ID
 WHERE o1.Purchase_Date < '2010-11-15';

Q5: Combining Q1 and Q4

There are several ways to do the combining. One is to use Q1 and Q4 as (complicated) expressions in the select-list; another is to use them as tables in the FROM clause which don't need a join between them because each is a single-row, single-column table that can be joined in a Cartesian product. Another would be a UNION, where each row is tagged with what it calculates.

SELECT (SELECT COUNT(DISTINCT Cust_ID)
          FROM Orders
         WHERE Purchase_Date = '2010-11-15') AS Total_Customers,
       (SELECT COUNT(DISTINCT o1.Cust_ID)
          FROM Orders AS o1
          JOIN (SELECT DISTINCT o2.Cust_ID
                  FROM Orders AS o2
                 WHERE o2.Purchase_Date = '2010-11-15') AS c1
            ON o1.Cust_ID = c1.Cust_ID
         WHERE o1.Purchase_Date < '2010-11-15') AS Returning_Customers
  FROM Dual;

(I'm blithely assuming MySQL has a DUAL table - similar to Oracle's. If not, it is trivial to create a table with a single column containing a single row of data. Update 2: bashing the MySQL 5.5 Manual shows that 'FROM Dual' is supported but not needed; MySQL is happy without a FROM clause.)

Update 1: added qualifier 'o1.Cust_ID' in key locations to avoid 'ambiguous column name' as indicated in the comment.


How about

SELECT * FROM
  (SELECT * FROM 
    (SELECT CUST_ID, COUNT(*) AS ORDER_COUNT, 1 AS OLD_CUSTOMER, 0 AS NEW_CUSTOMER
       FROM ORDERS
       GROUP BY CUST_ID
       HAVING ORDER_COUNT > 1)
  UNION ALL
    (SELECT CUST_ID, COUNT(*) AS ORDER_COUNT, 0 AS OLD_CUSTOMER, 1 AS NEW_CUSTOMER
       FROM ORDERS
       GROUP BY CUST_ID
       HAVING ORDER_COUNT = 1)) G
  INNER JOIN
    (SELECT CUST_ID, ORDER_DATE
       FROM ORDERS) O
    USING (CUST_ID)
  WHERE ORDER_DATE = [date of interest] AND
        OLD_CUSTOMER = [0 or 1, depending on what you want] AND
        NEW_CUSTOMER = [0 or 1, depending on what you want]

Not sure if that'll do the whole thing, but it might provide a starting point.

Share and enjoy.


select count(distinct o1.cust_id) as repeat_count, 
       count(distinct o.cust_id)-count(distinct o1.cust_id) as new_count
    from orders o
        left join (select cust_id
                       from orders
                       where purchase_date < "2010-11-15"
                       group by cust_id) o1
            on o.cust_id = o1.cust_id
    where o.purchase_date = "2010-11-15"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜