How to use Group By and self-join to return min, max, open, and close daily price restult set?
SOLVED
All hail StackOverlow!
While I was gone, people left 2 solutions (thanks guys--what is the protocol for handing out karma for two working solutions?)
Here is the solution that I came back to post. it is derived from yet ANOTHER StackOver solution:
How to fetch the first and last record of a grouped record in a MySQL query with aggregate functions?
...and my adaptation is:
SELECT
DATE_FORMAT(`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`PRICE`) AS min_price,
MAX(`PRICE`) AS max_price,
SUBSTRING_INDEX(
GROUP_CONCAT(
PRICE
ORDER BY DTE ASC
)
, ',', 1 ) AS opn_price,
SUBSTRING_INDEX(
GROUP_CONCAT(
PRICE
ORDER BY DTE DESC
)
, ',', 1 ) AS cls_price
FROM `CHART_DATA`
GROUP BY trading_day
;
The data the "Q" above starts with is the data with which I am trying to end. Hopefully, this helps someone else since I suspect that my log is fairly common.
I am willing to bet that one of these three solutions has a performance advantage. If anyone happens to know the inner workings of MySQL and query optimization and cares to recommend the "preferred" solution, that will be useful to know in the future.
END SOLVED
UPDATE #2
Tryhing to come at it from yet another direction using this:
http://forums.mysql.com/read.php?65,363723,363723
I get:
SELECT
DATE_FORMAT(`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`PRICE`) AS min_price,
MAX(`PRICE`) AS max_price,
(SELECT opn_price FROM
(SELECT
DATE_FORMAT(`DTE`, "%m/%d/%Y") AS a_day,
PRICE AS opn_price,
MIN(DTE) AS opn
FROM `CHART_DATA`
GROUP BY a_day
ORDER BY opn ASC LIMIT 1) AS tblO) AS opnqt,
(SELECT cls_price FROM
(SELECT
DATE_FORMAT(`DTE`, "%m/%d/%Y") AS a_day,
PRICE AS cls_price,
MIN(DTE) AS cls
FROM `CHART_DATA`
GROUP BY a_day
ORDER BY cls DESC LIMIT 1) AS tblC) AS clsqt
FROM `CHART_DATA` cht
GROUP BY trading_day;
This suffers from a similar dysfunction as the query in the first UPDATE below; the 'clsqt' (cls_price) value returned is the last closing price found in the data. Blech.
Plus, we are starting to get into "hideously complex" query space again and that CANNOT be good for performance.
But if anyone sees the fix for the 'clsqt' value, I would accept it gladly and solve the performance issue at a later day. :)
END UPDATE #2
UPDATE
So close...here's where I am today:
SELECT
DATE_FORMAT(`cht1`.`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`cht1`.`PRICE`) AS min_price,
MAX(`cht1`.`PRICE`) AS max_price,
MIN(cht1.DTE) AS opn_date1,
MIN(cht1.DTE) AS opn_date2,
`cht2`.`PRICE` AS opn_price,
MAX(cht1.DTE) AS cls_date1,
MAX(cht3.DTE) AS cls_date3,
`cht3`.`PRICE` AS cls_price
FROM `CHART_DATA` cht1
LEFT JOIN `CHART_DATA` cht2
ON cht2.DTE = cht1.DTE
LEFT JOIN `CHART_DATA` cht3
ON cht3.DTE = cht1.DTE
GROUP BY trading_day
HAVING opn_date1 = opn_date2
AND cls_date1 = cls_date3
;
This retuns everything correctly BUT the correct 'cls_price' (it is returnign the same value for 'cls_price' as 'opn_price').
However, 'cls_date1' and 'cls_date3' are the correct values, so I must be close.
Anyone see what I am not?
END UPDATE
I have been poring over SO with regard to left joins and self joins...and I must admit that I am not grokking.
I found this "Q" that seems very close to what I want: left join with condition for right table in mysql
What I am trying to do is retrieve open, close, min, and max price days from a single table (sample data below).
Min and Max are easy:
SELECT
DATE_FORMAT(`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`PRICE`) AS min_price,
MAX(`PRICE`) AS max_price
FROM `CHART_DATA`
GROUP BY trading_day;
I want the results returned group by date, somthing like:
'trading_day' 'opn_price' 'min_price' 'max_price' 'cls_price'
Okay, so I try 'baby steps' with just one join...
SELECT
DATE_FORMAT(`cht1`.`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`cht1`.`PRICE`) AS min_price,
MAX(`cht1`.`PRICE`) AS max_price,
`cht2`.`PRICE` AS opn_price
FROM `CHART_DATA` cht1
LEFT JOIN `CHART_DATA` cht2
ON cht2.DTE = MIN(cht1.DTE)
GROUP BY trading_day;
...and I get the message "Invalid use of group function"
Of course, removing the "GROUP BY" is no help, since I need to return aggegate columns.
I have a really complex solution that gets the open and close results, but not the min and max--and they are in separate result sets. I get the feeling that I have made this more complex than is necessary and that fi I could just grasp what is going on with the self joins cited in the "Q" referenced above, that my overall coding would improvie immeasurably. But I have spent something like 12 hours on this during the past weekend and am more confusted than ever.
All insight and explantion and observation is welcome at this point...
/* SAMPLE TABLE AND DATA */
CREATE TABLE `CHART_DATA` (
`ID` varchar(10) DEFAULT NULL,
`DTE` datetime DEFAULT NULL,
`PRICE` double DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
/*Data for the table `CHART_DATA` */
INSERT INTO `chart_data`
(`id`,`dte`,`price`)
VALUES ('1','2011-01-01 00:10:00',0.65),
('2','2011-01-01 06:10:00',0.92),
('3','2011-01-01 12:10:00',0.59),
('4','2011-01-01 18:10:00',0.16),
('5','2011-01-02 00:10:00',0.28),
('6','开发者_如何学Go2011-01-02 06:10:00',0.12),
('7','2011-01-02 12:10:00',0.92),
('8','2011-01-02 18:10:00',0.1),
('9','2011-01-03 00:10:00',0.34),
('10','2011-01-03 06:10:00',0.79),
('11','2011-01-03 12:10:00',1.23),
('12','2011-01-03 18:10:00',1.24),
('13','2011-01-04 00:10:00',1.12),
('14','2011-01-04 06:10:00',0.8),
('15','2011-01-04 12:10:00',0.65),
('16','2011-01-04 18:10:00',0.78),
('17','2011-01-05 00:10:00',0.65),
('18','2011-01-05 06:10:00',1.19),
('19','2011-01-05 12:10:00',0.89),
('20','2011-01-05 18:10:00',1.05),
('21','2011-01-06 00:10:00',0.29),
('22','2011-01-06 06:10:00',0.43),
('23','2011-01-06 12:10:00',0.26),
('24','2011-01-06 18:10:00',0.34),
('25','2011-01-07 00:10:00',0.22),
('26','2011-01-07 06:10:00',0.37),
('27','2011-01-07 12:10:00',1.22),
('28','2011-01-07 18:10:00',1.16),
('29','2011-01-08 00:10:00',0.3),
('30','2011-01-08 06:10:00',1.17),
('31','2011-01-08 12:10:00',0.62),
('32','2011-01-08 18:10:00',0.86),
('33','2011-01-09 00:10:00',0.84),
('34','2011-01-09 06:10:00',1.11),
('35','2011-01-09 12:10:00',0.92),
('36','2011-01-09 18:10:00',1.03),
('37','2011-01-10 00:10:00',1.13),
('38','2011-01-10 06:10:00',0.58),
('39','2011-01-10 12:10:00',1.03),
('40','2011-01-10 18:10:00',0.21),
('41','2011-01-11 00:10:00',0.12),
('42','2011-01-11 06:10:00',1.01),
('43','2011-01-11 12:10:00',0.19),
('44','2011-01-11 18:10:00',1.14),
('45','2011-01-12 00:10:00',0.55),
('46','2011-01-12 06:10:00',0.75),
('47','2011-01-12 12:10:00',0.66),
('48','2011-01-12 18:10:00',1.1),
('49','2011-01-13 00:10:00',0.68),
('50','2011-01-13 06:10:00',0.3),
('51','2011-01-13 12:10:00',0.9),
('52','2011-01-13 18:10:00',0.88),
('53','2011-01-14 00:10:00',0.64),
('54','2011-01-14 06:10:00',1.06),
('55','2011-01-14 12:10:00',1.12),
('56','2011-01-14 18:10:00',0.76),
('57','2011-01-15 00:10:00',0.18),
('58','2011-01-15 06:10:00',1.08),
('59','2011-01-15 12:10:00',0.66),
('60','2011-01-15 18:10:00',0.38),
('61','2011-01-16 00:10:00',1),
('62','2011-01-16 06:10:00',1.18),
('63','2011-01-16 12:10:00',1.15),
('64','2011-01-16 18:10:00',0.58),
('65','2011-01-17 00:10:00',1.04),
('66','2011-01-17 06:10:00',0.81),
('67','2011-01-17 12:10:00',0.35),
('68','2011-01-17 18:10:00',0.91),
('69','2011-01-18 00:10:00',0.14),
('70','2011-01-18 06:10:00',0.13),
('71','2011-01-18 12:10:00',1.03),
('72','2011-01-18 18:10:00',0.16),
('73','2011-01-19 00:10:00',1.05),
('74','2011-01-19 06:10:00',1.13),
('75','2011-01-19 12:10:00',1.21),
('76','2011-01-19 18:10:00',0.34),
('77','2011-01-20 00:10:00',0.63),
('78','2011-01-20 06:10:00',0.62),
('79','2011-01-20 12:10:00',0.19),
('80','2011-01-20 18:10:00',1.21),
('81','2011-01-21 00:10:00',0.83),
('82','2011-01-21 06:10:00',0.99),
('83','2011-01-21 12:10:00',0.83),
('84','2011-01-21 18:10:00',0.21),
('85','2011-01-22 00:10:00',0.8),
('86','2011-01-22 06:10:00',0.69),
('87','2011-01-22 12:10:00',0.87);
So you want:
- The alias
C
to correspond to all rows in the group for a given day, so you can useMAX()
andMIN()
over the rows in that group. - The alias
C2
to correspond to the last row in a given day. - The alias
C3
to correspond to a row later thanC2
on the same day. If none is found, i.e.C3.*
is NULL, thenC2
is the latest on that day.
This is often labeled a greatest-n-per-group
query, and it comes up frequently on Stack Overflow. Here's a solution that I tested for your test data, but you can follow the tag I added to your question for other solutions and discussion.
edit: I missed the requirement for both opening price and closing price. The following is edited.
SELECT DATE_FORMAT(C.`DTE`, '%m/%d/%Y') AS trading_day,
MIN(C.`PRICE`) AS min_price,
MAX(C.`PRICE`) AS max_price,
Copen.`PRICE` AS opening_price,
Cclose.`PRICE` AS closing_price
FROM `CHART_DATA` AS C
INNER JOIN `CHART_DATA` AS Cclose
ON DAY(C.`DTE`) = DAY(Cclose.`DTE`)
LEFT OUTER JOIN `CHART_DATA` AS Cclose_later
ON DAY(C.`DTE`) = DAY(Cclose_later.`DTE`) AND Cclose.`DTE` < Cclose_later.`DTE`
INNER JOIN `CHART_DATA` AS Copen
ON DAY(C.`DTE`) = DAY(Copen.`DTE`)
LEFT OUTER JOIN `CHART_DATA` AS Copen_earlier
ON DAY(C.`DTE`) = DAY(Copen_earlier.`DTE`) AND Copen.`DTE` < Copen_earlier.`DTE`
WHERE Cclose_later.`DTE` IS NULL AND Copen_earlier .`DTE` IS NULL
GROUP BY trading_day;
SELECT
a.trading_day, a.min_price, a.max_price,
b.price as opn_price,
c.price as cls_price
FROM
(SELECT
DATE_FORMAT(`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`PRICE`) AS min_price,
MAX(`PRICE`) AS max_price,
MIN(`dte`) AS open_date,
MAX(`dte`) AS close_date
FROM `CHART_DATA`
GROUP BY trading_day) a
LEFT JOIN
`CHART_DATA` b ON b.dte = a.open_date
LEFT JOIN
`CHART_DATA` c ON c.dte = a.close_date
Note: this solution may present some problems if your opening or closing entry has the exact same date/time value as another row (i.e. the transaction that came immediately after opening, or immediately before closing). To address this, I would suggest that you add a sequence number that is stored in a way that guarantees uniqueness, and increasing-with-respect-to-time. If you do this, then you would use the sequence value in the place of dte
to replace the open_date
and close_date
I've used as join fields in my example.
精彩评论