MySQL GROUP BY and Fill Empty Rows
I'm sure this has been answered before, but I can't find it in the detail that I need.
For an analytics system I need to be able to group rows and return them on a chart, either grouped by minute开发者_StackOverflow中文版, hour, day, month, or year. I have this working correctly (example code below).
SELECT COUNT( DISTINCT user_id ) , `hour` , `timestamp`
FROM tracking_request
WHERE site_id = '3'
AND `timestamp` < '2011-08-31 04:05:45'
AND `timestamp` > '2011-08-29 22:00:00'
GROUP BY `hour` , `day` , `month` , `year`
ORDER BY `timestamp` ASC
The problem is, like most charts, I need to fill the gaps where the data isn't (eg. no rows for the last 3 minutes). I read about creating a "calendar table" and joining that data, but how can I do this efficiently for each scale (eg. year would be a lot easier then minute, as minute would require MANY rows in the table)? If it helps, there is a column for each in the table (like above, you can see there is "hour", "day", etc.)
EDIT:
I ended up using PHP to accoplish this by using an empty array and then filling it. If anyone could think of an all (or mostly) SQL solution for this that would be more awesome.
In this answer I will outline how to generate your calendar tables.
Create three tables for days, hours and minutes:
CREATE TABLE days (
day DATE,
PRIMARY KEY (day)
)
CREATE TABLE hours (
hour INT,
PRIMARY KEY (hour)
)
CREATE TABLE minutes (
minute INT,
PRIMARY KEY (minute)
)
Fill the hours table with the numbers from 0 to 23 and the minutes table with the numbers from 0 to 59. To fill the days table, you can create a procedure like the following:
CREATE PROCEDURE make_days(IN start_date DATE, IN end_date DATE)
BEGIN
DECLARE curr_date DATE;
SET curr_date = start_date;
WHILE curr_date <= end_date DO
INSERT IGNORE INTO days(day) VALUES(curr_date);
SET curr_date = DATE_ADD(curr_date, INTERVAL 1 DAY);
END WHILE;
END
You can then call this procedure to create days like this:
CALL make_days('2011-01-01','2012-12-31');
Now you can create values for every minute in a given time interval using a query similar to the following:
SELECT YEAR(day) AS year, MONTH(day) AS month, DAYOFMONTH(day) AS day, hour, minute
FROM days, hours, minutes
WHERE CAST(CONCAT(day,' ',hour,':',minute) AS DATETIME) BETWEEN '2011-08-31 22:00' AND '2011-09-01 10:00'
ORDER BY year, month, day, hour, minute
This is typically done in data warehouses by having a date dimension that has a list of all possible dates. You do an OUTER JOIN to the date dimension and COALESCE the null values to 0.
精彩评论