SQL - Query Multiple Aggregations based on DateTime - MySQL
This is a complex one. But I have a table which has a DATETIME field, and a few other int and float fields which need to be summed and averaged. We want to do the summing and averaging on this table based on the time stamps and ultimately would aim to develop 3 queries which in a sense would build on one another.
So the able looks like this
TIMESTAMP |subj_diff| SCR2 | SCR3
2011-09-20 09:01:37 | 1 | 0.02 | 1.6
2011-09-20 09:04:18 | 3 | 0.09 | 1.8
2011-09-20 14:24:55 | 5 | 0.21 | 1.2
2011-09-21 18:50:47 | 8 | 0.08 | 0.9
2011-09-21 18:54:21 | 9 | 0.12 | 2.1
The three queries that we would like to generate are:
1. Sum up all the preceding items from a previous data up to and including the currently select record. There should also be another column with the total So say for example if we wanted the results between the 20th and 21st the returned table would look like:
TIMESTAMP |subj_diff| SCR2 | SCR3 | COUNT
2011-09-20 09:01:37 | 1 | 0.02 | ... | 1
2011-09-20 09:04:18 | 4 | 0.11 | | 2
2011-09-20 14:24:55 | 9 | 0.32 | | 3
2011-09-21 18:50:47 | 17 | ...
2011-09-21 18:54:21 | 26 |
2. Sum up the results at 5 minute time intervals - similar to the above however the query would return 3 rows as rows 1 & 2 and rows 4&5 would be summed together in the same fashion as above. IN this query its ok if for each 5 min interval that has nothing 0 is returned with a count of 0. E.g.
TIMESTAMP |subj_diff| SCR2 | SCR3 | COUNT
2011-09-20 09:05:00 | 4 | 0.11 | 3.4 | 2
2011-09-20 14:25:00 | 5 | 0.21 | 1.2 | 1
2011-09-21 18:55:00 | 17 | 0.20 | 3.0 | 2
3. Do the same thing in query number 1 for the result set of query number two for every 5 minute interval in the day (i.e. from 00:05:00 to 24:00:00).
This is a rather tricky one, I have no idea how to start this one. Would anyone be able to write SQL to solve this problem?
Heres some basic code using cursors and stored procs but it doesnt really work.
DROP PROCEDURE curdemo;
DELIMITER $$
CREATE PROCEDURE curdemo()
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE a datetime;
DECLARE b,c FLOAT;
DECLARE cur1 CURSOR FOR
SELECT msgDate, subj_diff FROM classifier_results
WHERE DATE(msgDate) >= DATE('2011-09-25');
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
CREATE TEMPORARY TAB开发者_JAVA百科LE IF NOT EXISTS temp_scores (d datetime, acc float);
OPEN cur1;
read_loop: LOOP
FETCH cur1 INTO a, b;
IF done THEN
LEAVE read_loop;
END IF;
INSERT temp_scores(d,acc)
SELECT a, SUM(subj_diff) FROM classifier_results
WHERE DATE(msgDate) >= DATE('2011-09-25')
AND msgDate <= a;
END LOOP;
CLOSE cur1;
SELECT * FROM temp_scores;
END;
Cheers!
Try this code -
Create and populale table:
CREATE TABLE classifier_results(
`TIMESTAMP` DATETIME NOT NULL,
subj_diff INT(11) DEFAULT NULL,
scr2 FLOAT(10, 5) DEFAULT NULL,
scr3 FLOAT(10, 5) DEFAULT NULL
);
INSERT INTO classifier_results VALUES
('2011-09-20 09:01:37', 1, 0.02000, 1.60000),
('2011-09-20 09:04:18', 3, 0.09000, 1.80000),
('2011-09-20 14:24:55', 5, 0.21000, 1.20000),
('2011-09-21 18:50:47', 8, 0.08000, 0.90000),
('2011-09-21 18:54:21', 9, 0.12000, 2.10000);
And execute these queries:
-- 1 query
SET @subj_diff = 0;
SET @scr2 = 0;
SET @scr3 = 0;
SET @cnt = 0;
SELECT timestamp,
@subj_diff:=IF(@subj_diff IS NULL, subj_diff, @subj_diff + subj_diff) subj_diff,
@scr2:=IF(@scr2 IS NULL, scr2, @scr2 + scr2) scr2,
@scr3:=IF(@scr3 IS NULL, scr3, @scr3 + scr3) scr3,
@cnt:=@cnt+1 count
FROM classifier_results;
+---------------------+-----------+---------+---------+-------+
| timestamp | subj_diff | scr2 | scr3 | count |
+---------------------+-----------+---------+---------+-------+
| 2011-09-20 09:01:37 | 1 | 0.02000 | 1.60000 | 1 |
| 2011-09-20 09:04:18 | 4 | 0.11000 | 3.40000 | 2 |
| 2011-09-20 14:24:55 | 9 | 0.32000 | 4.60000 | 3 |
| 2011-09-21 18:50:47 | 17 | 0.40000 | 5.50000 | 4 |
| 2011-09-21 18:54:21 | 26 | 0.52000 | 7.60000 | 5 |
+---------------------+-----------+---------+---------+-------+
-- 2 query
SELECT
DATE(timestamp) + INTERVAL 5 * (12 * HOUR(timestamp) + FLOOR(MINUTE(timestamp) / 5)) MINUTE new_timestamp,
SUM(subj_diff) subj_diff,
SUM(scr2) scr2,
SUM(scr3) scr3,
COUNT(*) count
FROM classifier_results
GROUP BY new_timestamp;
+---------------------+-----------+---------+---------+-------+
| new_timestamp | subj_diff | scr2 | scr3 | count |
+---------------------+-----------+---------+---------+-------+
| 2011-09-20 09:00:00 | 4 | 0.11000 | 3.40000 | 2 |
| 2011-09-20 14:20:00 | 5 | 0.21000 | 1.20000 | 1 |
| 2011-09-21 18:50:00 | 17 | 0.20000 | 3.00000 | 2 |
+---------------------+-----------+---------+---------+-------+
-- 3 query
SET @subj_diff = 0;
SET @scr2 = 0;
SET @scr3 = 0;
SET @cnt = 0;
SELECT new_timestamp timestamp,
@subj_diff:=IF(@subj_diff IS NULL, subj_diff, @subj_diff + subj_diff) subj_diff,
@scr2:=IF(@scr2 IS NULL, scr2, @scr2 + scr2) scr2,
@scr3:=IF(@scr3 IS NULL, scr3, @scr3 + scr3) scr3,
@cnt:=@cnt+1 count
FROM (
SELECT
DATE(timestamp) + INTERVAL 5 * (12 * HOUR(timestamp) + FLOOR(MINUTE(timestamp) / 5)) MINUTE new_timestamp,
SUM(subj_diff) subj_diff,
SUM(scr2) scr2,
SUM(scr3) scr3,
COUNT(*) count
FROM classifier_results
GROUP BY new_timestamp
) t;
+---------------------+-----------+---------+---------+-------+
| timestamp | subj_diff | scr2 | scr3 | count |
+---------------------+-----------+---------+---------+-------+
| 2011-09-20 09:00:00 | 4 | 0.11000 | 3.40000 | 1 |
| 2011-09-20 14:20:00 | 9 | 0.32000 | 4.60000 | 2 |
| 2011-09-21 18:50:00 | 26 | 0.52000 | 7.60000 | 3 |
+---------------------+-----------+---------+---------+-------+
Good luck!
I haven't tested this, but let me know if this works for you:
1.
SET @csum:=0;
SELECT a.msgDate, (@csum:=@csum + a.subj_diff) AS subj_diff
FROM classifier_results a
2.
SELECT FROM_UNIXTIME(FLOOR(UNIX_TIMESTAMP(a.msgDate)/(60*5))*(60*5)) as msgDate, sum(a.subj_diff) AS subj_diff
FROM classifier_results a
3.
SET @csum:=0;
SELECT b.msgDate, (@csum:=@csum + b.subj_diff) AS subj_diff
FROM (
SELECT FROM_UNIXTIME(FLOOR(UNIX_TIMESTAMP(a.msgDate)/(60*5))*(60*5)) as msgDate, sum(a.subj_diff) AS subj_diff
FROM classifier_results a
) b
Lets see here...
1:
SELECT TIMESTAMP,
(@var_subj_diff := @var_subj_diff + subj_diff) AS subj_diff,
(@var_SCR2 := @var_SCR2 + SCR2) AS SCR2,
(@var_SCR3 := @var_SCR3 + SCR3) AS SCR3,
(@rownum := @rownum + 1) AS COUNT
FROM classifier_results,
(SELECT @var_subj_diff := 0, @var_SCR2 := 0, @var_SCR3 := 0, @rownum := 0) AS vars
WHERE TIMESTAMP BETWEEN '2011-09-20' AND '2011-09-21'
ORDER BY TIMESTAMP ASC
2:
SELECT FROM_UNIXTIME(ROUND(UNIX_TIMESTAMP(TIMESTAMP) / (60 * 5)) * (60 * 5)) AS TIMESTAMP,
SUM(subj_diff) AS subj_diff, SUM(SCR2) AS SCR2, SUM(SCR3) AS SCR3, COUNT(*) AS COUNT
FROM classifer_results
GROUP BY TIMESTAMP
ORDER BY TIMESTAMP ASC
3:
SELECT FROM_UNIXTIME(ROUND(UNIX_TIMESTAMP(TIMESTAMP) / (60 * 5)) * (60 * 5)) AS TIMESTAMP,
(@var_subj_diff := @var_subj_diff + SUM(subj_diff)) AS subj_diff,
(@var_SCR2 := @var_SCR2 + SUM(SCR2)) AS SCR2,
(@var_SCR3 := @var_SCR3 + SUM(SCR3)) AS SCR3,
(@rownum := @rownum + 1) AS COUNT
FROM classifier_results,
(SELECT @var_subj_diff := 0, @var_SCR2 := 0, @var_SCR3 := 0, @rownum := 0) AS vars
GROUP BY TIMESTAMP
WHERE TIMESTAMP BETWEEN '2011-09-20' AND '2011-09-21'
ORDER BY TIMESTAMP ASC
Hope I understand correctly, and let me know it some problems show up. :)
It looks to me that you might want to delve into database programming using stored procedures and cursors.
Stored Procedures: http://dev.mysql.com/doc/refman/5.5/en/stored-programs-defining.html
Cursors: http://dev.mysql.com/doc/refman/5.5/en/cursors.html
This is a huge ball-o-wax which is beyond the scope of this post.
精彩评论