How do I get a SUM to calculate properly with a join?
So I'm trying to count the number of parts, number of tasks, the quantity in each job and the time that it took to manufacture each job but I'm getting some funky results. If I run this:
SELECT
j.id,
mf.special_instructions,
count(distinct p.id) as number_of_different_parts,
count(distinct t.id) as number_of_tasks,
SUM(distinct j.quantity) as number_of_assemblies,
SUM(l.time_elapsed) as time_elapsed
FROM
sugarcrm2.mf_job mf
INNER JOIN ramses.jobs j on
mf.id = j.mf_job_id
INNER JOIN ramses.parts p on
j.id = p.job_id
INNER JOIN ramses.tasks t on
p.id = t.part_id
INNER JOIN ramses.batch_log l on
t.batch_id = l.batch_id
WHERE
mf.job_description LIKE "%BACKBLAZE%" OR
mf.customer_name LIKE "%BACKBLAZE%" OR
mf.customer_ref LIKE "%BACKBLAZE%" OR
mf.technical_company_name LIKE "%BACKBLAZE%" OR
mf.description LIKE "%BACKBLAZE%" OR
mf.name LIKE "%BACKBLAZE%" OR
mf.enclosure_style LIKE "%BACKBLAZE%" OR
mf.special_instr开发者_JAVA百科uctions LIKE "%BACKBLAZE%"
Group by j.id
and I now get accurate parts and tasks numbers but the time_elapsed sum isn't correct. What could the problem be?
When I try it with distinct
I get a veeeeery low number (like something between 1 and 30 when I'm looking for something closer to 10,000.)
UPDATE: here is the create code:
http://pastebin.com/nbhU9rYh
http://pastebin.com/tdmAkNr4
http://pastebin.com/0TFCUaeQ
http://pastebin.com/fugr8C9U
http://pastebin.com/Zq0bKG2L
http://pastebin.com/k5rESUrq
The relationships are like this:
- mf_job info is linked to a job
- jobs have parts
- parts have tasks
- tasks are in batches
- batch_log is a table with all of the starts and stops for the batches of tasks, it has a start_time and a stop_time and a time_elapsed.
I am trying to get all of the time_elapsed from the batch_log for each mf_job with the word backblaze in one of it's fields along with the number of parts, tasks and assemblies. This all needs to be grouped by job.id or mf_job.id
Try and rewrite the query to:
SELECT
j.id,
mf.special_instructions,
count(p.id) as number_of_different_parts,
count(t.id) as number_of_tasks,
SUM(j.quantity) as number_of_assemblies,
SEC_TO_TIME(SUM(l.seconds_elapsed)) as time_elapsed
FROM
sugarcrm2.mf_job mf
INNER JOIN ramses.jobs j on
mf.id = j.mf_job_id
INNER JOIN ramses.parts p on
j.id = p.job_id
INNER JOIN ramses.tasks t on
p.id = t.part_id
INNER JOIN (
SELECT rl.batch_id
, SUM(TIME_TO_SEC(rl.time_elapsed)) as seconds_elapsed
FROM ramses.batch_log rl
GROUP BY rl.batch_id
) l ON (t.batch_id = l.batch_id)
WHERE
mf.job_description LIKE "%BACKBLAZE%" OR
mf.customer_name LIKE "%BACKBLAZE%" OR
mf.customer_ref LIKE "%BACKBLAZE%" OR
mf.technical_company_name LIKE "%BACKBLAZE%" OR
mf.description LIKE "%BACKBLAZE%" OR
mf.name LIKE "%BACKBLAZE%" OR
mf.enclosure_style LIKE "%BACKBLAZE%" OR
mf.special_instructions LIKE "%BACKBLAZE%"
GROUP BY j.id WITH ROLLUP
You need to change the query to:
SELECT
...
SEC_TO_TIME(SUM(TIME_TO_SEC(l.time_elapsed))) as time_elapsed
Also, the line of LIKE '%...'
will make the query uber slow, because no indexes on this can be used.
If you are able to use MyISAM, you can use a fulltext index on those columns and use code like:
WHERE MATCH(mf.job_description,mf.customer_name,mf.customer_name,...)
AGAINST ('BACKBLAZE' IN NATURAL LANGUAGE MODE)
See:
http://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html
http://www.petefreitag.com/item/477.cfm
http://dev.mysql.com/doc/refman/5.0/en/date-and-time-functions.html#function_time-to-sec
It sounds like the problem is that multiple tasks can be in the same batch, and/or multiple parts can be in the same task. Say for example that your job has 3 parts, and each part has a task, and all 3 tasks are in the same batch. You'll add the time for that batch three times. But distinct won't work either, since if you have 5 different batches that all took 300 seconds they won't be considered distinct.
In a situation like this, a subquery is usually the way to go. Instead of joining directly with batch_log
, you would join with a subquery that selected distinct j.id
(or p.job_id
), l.batch_id
, and l.time_elapsed
(the first is for joining, the second just for proper calculation of distinct, and the third the actual value to use). Then you can sum l.time_elapsed
from there. This way each batch is counted exactly once.
batches(l) table doesn't have a field called time_elapsed tasks does so it's either
SUM(t.time_elapsed) as time_elapsed
-or-
SUM(l.actual_time) as time_elapsed
精彩评论