开发者

Multiple GROUP BY's & sort by SUM'd group values

I'm working on the reports for our time tracking app. Each time entry is related to a project and a service. Here a simplified query to group time entries by project and service.

SELECT                    
  projects.name as project_name,
  services.name as service_name,
  SUM(minutes) AS minutes 
FROM `time_entr开发者_C百科ies`             
JOIN `projects` ON `projects`.id = `time_entries`.project_id 
JOIN `services` ON `services`.id = `time_entries`.service_id 
GROUP BY 
  time_entries.project_id, 
  time_entries.service_id    
ORDER BY
  max(minutes)   DESC

This will result in a table like that:

+---------------+--------------+---------+
| project_name  | service_name | minutes |
+---------------+--------------+---------+
| Business Card | Consulting   |    4800 |
| Microsite     | Coding       |    3200 |
| Microsite     | Consulting   |    2400 |
| Microsite     | Design       |    2400 |
| Business Card | Design       |     800 |
+---------------+--------------+---------+

What I try to achieve though is the possibility, to sort by SUM'd project minutes. Not the project »Business Card« should be on top, but the project »Microsite«, because it has more minutes.

+---------------+--------------+-----------------+---------+
| project_name  | service_name | project_minutes | minutes |
+---------------+--------------+-----------------+---------+
| Microsite     | Coding       |            8000 |    3200 |
| Microsite     | Consulting   |            8000 |    2400 |
| Microsite     | Design       |            8000 |    2400 |
| Business Card | Consulting   |            5600 |    4800 |
| Business Card | Design       |            5600 |     800 |
+---------------+--------------+-----------------+---------+

The only way I've found to get the column »project_minutes« is to create a table first and join it with itself. The query I came up with:

DROP TABLE IF EXISTS group2;    
CREATE TABLE group2     SELECT                     
  projects.id as project_id,
  projects.name as project_name,
  services.name as service_name,
  SUM(minutes) AS minutes 
FROM `time_entries`             
JOIN `projects` ON `projects`.id = `time_entries`.project_id 
JOIN `services` ON `services`.id = `time_entries`.service_id 
GROUP BY 
  time_entries.project_id, 
  time_entries.service_id    
ORDER BY
  max(minutes)   DESC
LIMIT 0, 30;

SELECT 
  project_name, service_name, project_minutes, minutes
FROM  
  group2
LEFT JOIN 
  (
    SELECT project_id as project_id, sum(minutes) AS project_minutes
      FROM group2
     GROUP BY project_id         
  ) as group1  on group1.project_id = group2.project_id
ORDER BY 
  project_minutes DESC, 
  minutes DESC;    

I'm not even able to create a temporary table to due to a mySQL Bug (?): http://www.google.com/search?&q=site:bugs.mysql.com+reopen+temporary+table

My questions:

  1. What would be the best way to achieve a column like »project_minutes« that SUMs up a groups minutes and adds the result as an extra column? Is there a neat SQL trick I'm not aware of?
  2. If you don't see a way for my first question, do you think it makes sense to go with creating an extra table for each query? Is it faster than doing this logic manually after in the code? We use Rails, in case that makes any difference.

Thanks a lot for your help!

UPDATE

Thanks for your replies so far. I'm summarizing them as a gist to get a better overview: http://gist.github.com/553560

Am I right that there is no other way than querying the time_entries table once per each group by statement? If yes, do you see performance issues because of the following facts:

  1. The table time_entries is by far the one with the most rows (~4 million)
  2. The user can group by up to 6 columns. Have a look at this screenshot: http://dl.dropbox.com/u/732913/time_entries_grouped_by_customer_project_service_user.png


Something like this should do what you want:

SELECT ilv1.date_at, ilv1.project_name, ilv1.service_name, ilv1.minutes
FROM 
( SELECT                             
  te1.date_at,
  p1.name as project_name,
  s1.name as service_name,
  SUM(minutes) AS minutes 
FROM time_entries te1             
LEFT OUTER JOIN projects p1 ON p1.id = te1.project_id 
LEFT OUTER JOIN services s1 ON s1.id = te1.service_id 
GROUP BY 
  te1.project_id, 
  te1.service_id) AS ilv1,
( SELECT                             
  te2.date_at,
  p2.name as project_name,
  SUM(minutes) AS minutes 
FROM time_entries te1             
LEFT OUTER JOIN projects p1 ON p1.id = te1.project_id  
GROUP BY 
  te1.project_id) AS ilv2

WHERE ilv1.date_at=ilv2.date_at AND ilv1.project_name=ilv2.project_name ORDER BY ilv2.minutes;

(Do you really, really need all those outer joins - they are going to hurt the performance a lot)

It'll probably be a lot more efficient to use the materialized view base on your original query (and a two pass query with different groupings as above). But a halfway house might be to use the same query base query twice and wrap one in a consolidation block, e.g.

SELECT ilv1.date_at, ilv1.project_name, ilv1.service_name, ilv1.minutes
FROM 
 (....) ilv1,
 (SELECT ilv3.date_at, ilv3.project_name, sum(ilv3.minutes) as minutes 
  FROM (...copy of ilv1) ilv3
  GROUP BY ilv3.date_at, ilv3.project_name
 ) ilv2
WHERE ilv1.date_at=ilv2.date_at

AND ilv1.project_name=ilv2.project_name ORDER BY ilv2.minutes;

C.


I assume that project_id in time_entries always is NOT NULL, and services_id can be null

Select t.date, t.project_name, t.service_name, p.minutes as Project_minutes, t.minutes
FROM
(SELECT                             
  time_entries.date_at,
  time_entries.project_Id,
  projects.name as project_name,
  services.name as service_name,
  SUM(minutes) AS minutes 
FROM time_entries             
JOIN projects ON projects.id = time_entries.project_id 
LEFT JOIN services ON services.id = time_entries.service_id 
GROUP BY 
  time_entries.date_at
  time_entries.project_id, 
  time_entries.service_id    
) t
JOIN
  (Select date_at, project_Id, Sum(minutes) minutes
  from time_entries
  group by date_at, project_id) p
ON (p.date_at = t.date_at AND p.project_id = t.project_id)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜