开发者

Optimizing a big MySQL aggregate query for analytics

I'm trying to build out some basic marketing analytics tools and want to provide an "Transactions to date at Day N" summary for each campaign code.

Is there a way to make a query like this more efficient? For each day_n column I want to count all transactions made before or on that day.

SELECT 
c.campaign_code, 
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 1) as day_1,
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 2) as day_2,
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 3) as day_3,
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 4) as day_4,
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 5) as day_5,
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 6) as day_6,
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 7) as day_7,
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 14) as day_14,
(select count(*) from _t_transac开发者_如何学JAVAtions where campaign_code=c.campaign_code and day <= 30) as day_30,
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 60) as day_60,
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 90) as day_90
FROM campaigns c LEFT JOIN _t_transactions t ON c.campaign_code=t.campaign_code

Table structure is...

CREATE TEMPORARY TABLE `campaigns` (
`campaign_code` varchar(255) DEFAULT NULL,
);

CREATE TABLE `_t_transactions` (
`id` int(11) DEFAULT NULL,
`campaign_code` varchar(255) DEFAULT NULL,
`day` int(11) DEFAULT NULL
);


So the question wasn't quite representative--what I really needed to accomplish was an ROI calculation after N days. Turns out the conditional that @tandu suggested works perfectly, and returns in 0.5 seconds instead of 33+ seconds (for my fairly small data set).

And of course, as tandu also points out, this is not really a job for MySQL.

Here's what I ended up with:

SELECT 
c.campaign_code, c.clicks, c.cpc, c.spent,
sum(IF(day <= 1, amount, 0)) / c.spent as day_1,
sum(IF(day <= 2, amount, 0)) / c.spent as day_2,
sum(IF(day <= 3, amount, 0)) / c.spent as day_3,
sum(IF(day <= 4, amount, 0)) / c.spent as day_4,
sum(IF(day <= 5, amount, 0)) / c.spent as day_5,
sum(IF(day <= 6, amount, 0)) / c.spent as day_6,
sum(IF(day <= 7, amount, 0)) / c.spent as day_7,
sum(IF(day <= 14, amount, 0)) / c.spent as day_14,
sum(IF(day <= 30, amount, 0)) / c.spent as day_30,
sum(IF(day <= 60, amount, 0)) / c.spent as day_60,
sum(IF(day <= 90, amount, 0)) / c.spent as day_90
FROM _t_transactions t LEFT JOIN campaigns c ON c.campaign_code=t.campaign_code 
GROUP BY t.campaign_code;

Didn't realize conditions could be used in aggregate functions... really useful in this case. This also solves a related problem for me of wanting to use temporary tables for this, but not being able to reference the same temp table multiple times in the same query.


I'm no expert, and the MySQL query optimizer might take care of this stuff for you, but here are a few tips:

  1. If you don't GROUP BY campaign_code, won't you get a row for every transaction? Maybe I'm wrong.
  2. It seems like you're only interested in transactions with day <= 90. Add this to the WHERE clause to significantly reduce the time on your results.
  3. Instead of searching in transactions campaign_code, it would be much faster to use a surrogate integer that is indexed.

Restructure your tables:

CREATE TEMPORARY TABLE campaigns (
   campaign_id int unsigned not null auto_increment primary key,
   campaign_code varchar(255)
);

CREATE TABLE `_t_transaction` (
   `id` int(11),
   `campaign_id` int unsigned not null,
   key (campaign_id)
   foreign key (campaign_id) references campaigns (campaign_id),
   `day` int(11)
);

It may also be faster to use a derived table that gets the information you want:

SELECT
   campaign_code,
   SUM(IF(day <= 1, 1, 0)) as day_1,
   SUM(IF(day <= 2, 1, 0)) as day_2,
   -- ...
FROM
   campaigns
   NATURAL JOIN _t_transactions
   NATURAL JOIN (
      SELECT
         id
      FROM
         _t_transactions
      WHERE
         day <= 90
   ) derived
GROUP BY
   campaign_code

This will examine significantly fewer rows and could save you some time if the initial query is running long. Or not.

Final recommendation is to not use MySQL for financial analytic storage and processing as it's not really designed for that. You can see how hard it is to perform a pivot. Either switch to a different RDBMS that's designed for this sort of thing, or delegate it to your scripting language of choice.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜