Tricky Rails3/mysql query
In rails 3 (also with meta_where gem if you feel like using it in your query), I got a really tricky query that I have been banging my head for:
Suppose I have two models, customer开发者_开发问答s and purchases, customer have many purchases. Let's define customers with at least 2 purchases as "repeat_customer". I need to find the total number of repeat_customers by each day for the past 3 months, something like:
Date TotalRepeatCustomerCount
1/1/11 10 (10 repeat customers by the end of 1/1/11)
1/2/11 15 (5 more customer gained "repeat" status on this date)
1/3/11 16 (1 more customer gained "repeat" status on this date)
...
3/30/11 150
3/31/11 160
Basically I need to group customer count based on the date of creation of their second purchase, since that is when they "gain repeat status".
Certainly this can be achieved in ruby, something like:
Customer.includes(:purchases).all.select{|x| x.purchases.count >= 2 }.group_by{|x| x.purchases.second.created_at.to_date }.map{|date, customers| [date, customers.count]}
However, the above code will fire query on the same lines of Customer.all
and Purchase.all
, then do a bunch of calculation in ruby. I would much prefer doing selection, grouping and calculations in mysql, since it is not only much faster, it also reduces the bandwith from the database. In large databases, the code above is basically useless.
I have been trying for a while to conjure up the query in rails/active_record, but have no luck even with the nice meta_where gem. If I have to, I will accept a solution in pure mysql query as well.
Edited: I would cache it (or add a "repeat" field to customers), though only for this simplified problem. The criteria for repeat customer can change by the client at any point (2 purchases, 3 purchases, 4 purchases etc), so unfortunately I do have to calculate it on the spot.
SELECT p_date, COUNT(customers.id) FROM
(
SELECT p_date - INTERVAL 1 day p_date, customers.id
FROM
customers NATURAL JOIN purchases
JOIN (SELECT DISTINCT date(purchase_date) p_date FROM purchases) p_dates
WHERE purchases.purchase_date < p_date
GROUP BY p_date, customers.id
HAVING COUNT(purchases.id) >= 2
) a
GROUP BY p_date
I didn't test this in the slightest, so I hope it works. Also, I hope I understood what you are trying to accomplish.
But please note that you should not do this, it'll be too slow. Since the data never changes once the day is passed, just cache it for each day.
精彩评论