MySQL - Joining two tables with datetime columns on Date and three most recent entries before that date
I have two tables in my SQL
For example Table1 - ItemPrice:
DATETIME | ITEM | PRICE
2011-08-28 | ABC 123
2011-09-01 | ABC 125
2011-09-02 | ABC 124
2011-09-03 | ABC 127
2011-09-04 | ABC 126
Table2 - DayScore:
DATETIME | ITEM | SCORE
2011-08-28 | ABC 1
2011-08-29 | ABC 开发者_运维百科 8
2011-09-01 | ABC 4
2011-09-02 | ABC 2
2011-09-03 | ABC 7
2011-09-04 | ABC 3
I want to write a query, which given a item ID (e.g. ABC), will return the price at that date from ItemPrice
(of there is no price for that date then the query should not return anything). If a valid price is found for the query date, the query should return (in 9 columns)
- the price of the item from
ItemPrice
for the past three days (i.e. the most recent 3 prices before the date queried). - In the next three columns it should return, from
DayScore
, the matching score for those 3 dates selected from ItemPrice. - Finally the dates (t-1 to t-3) selected
In otherwords the results for this query looking at just date='2011-09-03' as an example for item='abc' would return:
DATE | ITEM | PRICE | SCR | PRC_t-1 | PRC_t-2 | PRC_t-3 | SCR_t-1 | SCR_t-2 | SCR_t-3 | DATE_t-1 | DATE_t-2 | DATE_t-3
2011-09-03| ABC | 127 | 7 | 124 | 125 | 123 | 2 | 4 | 1 | 2011-09-02| 2011-09-01| 2011-08-28
....
Etc for each date that appears in ItemPrice
table.
What is the neatest and most efficient way to run this query (as its something that will be run over many millions of rows)?
Cheers!
Pretty no but it does produce the results. You could probably get rid of some subselects and make it a bit less sql but I tried to build it up in steps so you can deduct what it is doing.
The core part is this select:
SELECT
Sub2.*
, (Select MAX(IP3.DateTime) FROM ItemPrice IP3 where IP3.DateTime < T_2) AS T_3
FROM
(SELECT
Sub1.*
, (Select MAX(IP2.DateTime) FROM ItemPrice IP2 where IP2.DateTime < T_1) AS T_2
FROM
(SELECT
ItemPrice.DateTime
, (Select MAX(IP.DateTime) FROM ItemPrice IP where IP.DateTime < ItemPrice.DateTime) AS T_1
From ItemPrice) Sub1
) Sub2
This returns a table with the dates (now, t-1, t-2, t-3). From there is is simple joining with price and score for each of those dates. The whole things including testdata the becomes this bulk of sql
/*
CREATE TABLE ItemPrice (datetime Date, item varchar(3), price int);
CREATE TABLE DayScore ( datetime Date, item varchar(3), score int);
INSERT INTO ItemPrice VALUES ('20110828', 'ABC', 123);
INSERT INTO ItemPrice VALUES ('20110901', 'ABC', 125);
INSERT INTO ItemPrice VALUES ('20110902', 'ABC', 124);
INSERT INTO ItemPrice VALUES ('20110903', 'ABC', 127);
INSERT INTO ItemPrice VALUES ('20110904', 'ABC', 126);
INSERT INTO DayScore VALUES ('20110828', 'ABC', 1);
INSERT INTO DayScore VALUES ('20110829', 'ABC', 8);
INSERT INTO DayScore VALUES ('20110901', 'ABC', 4);
INSERT INTO DayScore VALUES ('20110902', 'ABC', 2);
INSERT INTO DayScore VALUES ('20110903', 'ABC', 7);
INSERT INTO DayScore VALUES ('20110904', 'ABC', 3);
*/
SELECT Hist.*, Current.Item, Current.Price, Current.Score
, Minus1.Price as PRC_1, Minus1.Score SCR_1
, Minus2.Price as PRC_2, Minus2.Score SCR_2
, Minus3.Price as PRC_3, Minus3.Score SCR_3
FROM
(SELECT Sub2.*, (Select MAX(IP3.DateTime) FROM ItemPrice IP3 where IP3.DateTime < T_2) AS T_3
FROM
(SELECT Sub1.*, (Select MAX(IP2.DateTime) FROM ItemPrice IP2 where IP2.DateTime < T_1) AS T_2
FROM
(SELECT ItemPrice.DateTime, (Select MAX(IP.DateTime) FROM ItemPrice IP where IP.DateTime < ItemPrice.DateTime) AS T_1 From ItemPrice) Sub1) Sub2) Hist
INNER JOIN
(SELECT ItemPrice.DateTime, ItemPrice.Item, ItemPrice.Price, DayScore.Score FROM ItemPrice INNER JOIN DayScore ON (ItemPrice.Item = DayScore.Item AND ItemPrice.Datetime = DayScore.DateTime)) CURRENT
ON (Current.DateTime = Hist.DateTime)
LEFT JOIN
(SELECT ItemPrice.DateTime, ItemPrice.Price, DayScore.Score FROM ItemPrice INNER JOIN DayScore ON (ItemPrice.Item = DayScore.Item AND ItemPrice.Datetime = DayScore.DateTime)) MINUS1
ON (Minus1.DateTime = Hist.T_1)
LEFT JOIN
(SELECT ItemPrice.DateTime, ItemPrice.Price, DayScore.Score FROM ItemPrice INNER JOIN DayScore ON (ItemPrice.Item = DayScore.Item AND ItemPrice.Datetime = DayScore.DateTime)) MINUS2
ON (Minus2.DateTime = Hist.T_2)
LEFT JOIN
(SELECT ItemPrice.DateTime, ItemPrice.Price, DayScore.Score FROM ItemPrice INNER JOIN DayScore ON (ItemPrice.Item = DayScore.Item AND ItemPrice.Datetime = DayScore.DateTime)) MINUS3
ON (Minus3.DateTime = Hist.T_3)
WHERE Current.Item = 'ABC'
;
/*
DROP TABLE ItemPrice;
DROP TABLE DayScore;
*/
I'm curious about your explain plan when you do this on 1M rows :) It might not even be that horrible if you have the right indexes which you probably do.
精彩评论