开发者

How do I get a SUM to calculate properly with a join?

So I'm trying to count the number of parts, number of tasks, the quantity in each job and the time that it took to manufacture each job but I'm getting some funky results. If I run this:

SELECT
  j.id, 
    mf.special_instructions,
  count(distinct p.id) as number_of_different_parts,
  count(distinct t.id) as number_of_tasks,
  SUM(distinct j.quantity) as number_of_assemblies,
  SUM(l.time_elapsed) as time_elapsed

FROM
  sugarcrm2.mf_job mf
INNER JOIN ramses.jobs j on
  mf.id = j.mf_job_id
INNER JOIN ramses.parts p on
  j.id = p.job_id
INNER JOIN ramses.tasks t on
  p.id = t.part_id
INNER JOIN ramses.batch_log l on
  t.batch_id = l.batch_id

WHERE 
  mf.job_description                LIKE "%BACKBLAZE%" OR
  mf.customer_name                  LIKE "%BACKBLAZE%" OR
  mf.customer_ref                   LIKE "%BACKBLAZE%" OR
  mf.technical_company_name LIKE "%BACKBLAZE%" OR
  mf.description                        LIKE "%BACKBLAZE%" OR
  mf.name                                   LIKE "%BACKBLAZE%" OR
  mf.enclosure_style                LIKE "%BACKBLAZE%" OR 
    mf.special_instr开发者_JAVA百科uctions     LIKE "%BACKBLAZE%"
Group by j.id

and I now get accurate parts and tasks numbers but the time_elapsed sum isn't correct. What could the problem be?

When I try it with distinct I get a veeeeery low number (like something between 1 and 30 when I'm looking for something closer to 10,000.)

UPDATE: here is the create code:

http://pastebin.com/nbhU9rYh

http://pastebin.com/tdmAkNr4

http://pastebin.com/0TFCUaeQ

http://pastebin.com/fugr8C9U

http://pastebin.com/Zq0bKG2L

http://pastebin.com/k5rESUrq

The relationships are like this:

  • mf_job info is linked to a job
  • jobs have parts
  • parts have tasks
  • tasks are in batches
  • batch_log is a table with all of the starts and stops for the batches of tasks, it has a start_time and a stop_time and a time_elapsed.

I am trying to get all of the time_elapsed from the batch_log for each mf_job with the word backblaze in one of it's fields along with the number of parts, tasks and assemblies. This all needs to be grouped by job.id or mf_job.id


Try and rewrite the query to:

SELECT
  j.id, 
  mf.special_instructions,
  count(p.id) as number_of_different_parts,
  count(t.id) as number_of_tasks,
  SUM(j.quantity) as number_of_assemblies,
  SEC_TO_TIME(SUM(l.seconds_elapsed)) as time_elapsed

FROM
  sugarcrm2.mf_job mf
INNER JOIN ramses.jobs j on
  mf.id = j.mf_job_id
INNER JOIN ramses.parts p on
  j.id = p.job_id
INNER JOIN ramses.tasks t on
  p.id = t.part_id
INNER JOIN (
            SELECT rl.batch_id
                  , SUM(TIME_TO_SEC(rl.time_elapsed)) as seconds_elapsed
            FROM ramses.batch_log rl 
            GROUP BY rl.batch_id
            ) l ON (t.batch_id = l.batch_id)

WHERE 
  mf.job_description                LIKE "%BACKBLAZE%" OR
  mf.customer_name                  LIKE "%BACKBLAZE%" OR
  mf.customer_ref                   LIKE "%BACKBLAZE%" OR
  mf.technical_company_name         LIKE "%BACKBLAZE%" OR
  mf.description                    LIKE "%BACKBLAZE%" OR
  mf.name                           LIKE "%BACKBLAZE%" OR
  mf.enclosure_style                LIKE "%BACKBLAZE%" OR 
  mf.special_instructions           LIKE "%BACKBLAZE%"
GROUP BY j.id WITH ROLLUP


You need to change the query to:

SELECT
  ...
  SEC_TO_TIME(SUM(TIME_TO_SEC(l.time_elapsed))) as time_elapsed

Also, the line of LIKE '%...' will make the query uber slow, because no indexes on this can be used.

If you are able to use MyISAM, you can use a fulltext index on those columns and use code like:

WHERE MATCH(mf.job_description,mf.customer_name,mf.customer_name,...) 
      AGAINST ('BACKBLAZE' IN NATURAL LANGUAGE MODE)

See:
http://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html
http://www.petefreitag.com/item/477.cfm
http://dev.mysql.com/doc/refman/5.0/en/date-and-time-functions.html#function_time-to-sec


It sounds like the problem is that multiple tasks can be in the same batch, and/or multiple parts can be in the same task. Say for example that your job has 3 parts, and each part has a task, and all 3 tasks are in the same batch. You'll add the time for that batch three times. But distinct won't work either, since if you have 5 different batches that all took 300 seconds they won't be considered distinct.

In a situation like this, a subquery is usually the way to go. Instead of joining directly with batch_log, you would join with a subquery that selected distinct j.id (or p.job_id), l.batch_id, and l.time_elapsed (the first is for joining, the second just for proper calculation of distinct, and the third the actual value to use). Then you can sum l.time_elapsed from there. This way each batch is counted exactly once.


batches(l) table doesn't have a field called time_elapsed tasks does so it's either

SUM(t.time_elapsed) as time_elapsed

-or-

SUM(l.actual_time) as time_elapsed
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜