Multiple GROUP BY's & sort by SUM'd group values
I'm working on the reports for our time tracking app. Each time entry is related to a project and a service. Here a simplified query to group time entries by project and service.
SELECT
projects.name as project_name,
services.name as service_name,
SUM(minutes) AS minutes
FROM `time_entr开发者_C百科ies`
JOIN `projects` ON `projects`.id = `time_entries`.project_id
JOIN `services` ON `services`.id = `time_entries`.service_id
GROUP BY
time_entries.project_id,
time_entries.service_id
ORDER BY
max(minutes) DESC
This will result in a table like that:
+---------------+--------------+---------+
| project_name | service_name | minutes |
+---------------+--------------+---------+
| Business Card | Consulting | 4800 |
| Microsite | Coding | 3200 |
| Microsite | Consulting | 2400 |
| Microsite | Design | 2400 |
| Business Card | Design | 800 |
+---------------+--------------+---------+
What I try to achieve though is the possibility, to sort by SUM'd project minutes. Not the project »Business Card« should be on top, but the project »Microsite«, because it has more minutes.
+---------------+--------------+-----------------+---------+
| project_name | service_name | project_minutes | minutes |
+---------------+--------------+-----------------+---------+
| Microsite | Coding | 8000 | 3200 |
| Microsite | Consulting | 8000 | 2400 |
| Microsite | Design | 8000 | 2400 |
| Business Card | Consulting | 5600 | 4800 |
| Business Card | Design | 5600 | 800 |
+---------------+--------------+-----------------+---------+
The only way I've found to get the column »project_minutes« is to create a table first and join it with itself. The query I came up with:
DROP TABLE IF EXISTS group2;
CREATE TABLE group2 SELECT
projects.id as project_id,
projects.name as project_name,
services.name as service_name,
SUM(minutes) AS minutes
FROM `time_entries`
JOIN `projects` ON `projects`.id = `time_entries`.project_id
JOIN `services` ON `services`.id = `time_entries`.service_id
GROUP BY
time_entries.project_id,
time_entries.service_id
ORDER BY
max(minutes) DESC
LIMIT 0, 30;
SELECT
project_name, service_name, project_minutes, minutes
FROM
group2
LEFT JOIN
(
SELECT project_id as project_id, sum(minutes) AS project_minutes
FROM group2
GROUP BY project_id
) as group1 on group1.project_id = group2.project_id
ORDER BY
project_minutes DESC,
minutes DESC;
I'm not even able to create a temporary table to due to a mySQL Bug (?): http://www.google.com/search?&q=site:bugs.mysql.com+reopen+temporary+table
My questions:
- What would be the best way to achieve a column like »project_minutes« that SUMs up a groups minutes and adds the result as an extra column? Is there a neat SQL trick I'm not aware of?
- If you don't see a way for my first question, do you think it makes sense to go with creating an extra table for each query? Is it faster than doing this logic manually after in the code? We use Rails, in case that makes any difference.
Thanks a lot for your help!
UPDATE
Thanks for your replies so far. I'm summarizing them as a gist to get a better overview: http://gist.github.com/553560
Am I right that there is no other way than querying the time_entries table once per each group by statement? If yes, do you see performance issues because of the following facts:
- The table time_entries is by far the one with the most rows (~4 million)
- The user can group by up to 6 columns. Have a look at this screenshot: http://dl.dropbox.com/u/732913/time_entries_grouped_by_customer_project_service_user.png
Something like this should do what you want:
SELECT ilv1.date_at, ilv1.project_name, ilv1.service_name, ilv1.minutes
FROM
( SELECT
te1.date_at,
p1.name as project_name,
s1.name as service_name,
SUM(minutes) AS minutes
FROM time_entries te1
LEFT OUTER JOIN projects p1 ON p1.id = te1.project_id
LEFT OUTER JOIN services s1 ON s1.id = te1.service_id
GROUP BY
te1.project_id,
te1.service_id) AS ilv1,
( SELECT
te2.date_at,
p2.name as project_name,
SUM(minutes) AS minutes
FROM time_entries te1
LEFT OUTER JOIN projects p1 ON p1.id = te1.project_id
GROUP BY
te1.project_id) AS ilv2
WHERE ilv1.date_at=ilv2.date_at AND ilv1.project_name=ilv2.project_name ORDER BY ilv2.minutes;
(Do you really, really need all those outer joins - they are going to hurt the performance a lot)
It'll probably be a lot more efficient to use the materialized view base on your original query (and a two pass query with different groupings as above). But a halfway house might be to use the same query base query twice and wrap one in a consolidation block, e.g.
SELECT ilv1.date_at, ilv1.project_name, ilv1.service_name, ilv1.minutes
FROM
(....) ilv1,
(SELECT ilv3.date_at, ilv3.project_name, sum(ilv3.minutes) as minutes
FROM (...copy of ilv1) ilv3
GROUP BY ilv3.date_at, ilv3.project_name
) ilv2
WHERE ilv1.date_at=ilv2.date_at
AND ilv1.project_name=ilv2.project_name ORDER BY ilv2.minutes;
C.
I assume that project_id in time_entries always is NOT NULL, and services_id can be null
Select t.date, t.project_name, t.service_name, p.minutes as Project_minutes, t.minutes
FROM
(SELECT
time_entries.date_at,
time_entries.project_Id,
projects.name as project_name,
services.name as service_name,
SUM(minutes) AS minutes
FROM time_entries
JOIN projects ON projects.id = time_entries.project_id
LEFT JOIN services ON services.id = time_entries.service_id
GROUP BY
time_entries.date_at
time_entries.project_id,
time_entries.service_id
) t
JOIN
(Select date_at, project_Id, Sum(minutes) minutes
from time_entries
group by date_at, project_id) p
ON (p.date_at = t.date_at AND p.project_id = t.project_id)
精彩评论