开发者

Should criteria be duplicated on subqueries

I have a query which actually runs two queries on a table. I query the whole table, a datediff and then a subquery which tells me the sum of hours each unit spent in certain operational steps. The main query limits the results to the REP depot so technically I don't need to put that same criteria on the subquery since repair_order is unique.

Would it be faster, slower or no difference to apply the depot filter on the subquery?

SELECT
  *,
  DATEDIFF(date_shipped, date_received) as htg_days,
  (SELECT SUM(t3.total_days) FROM report_tables.cycle_time_days as t3 WHERE t1.repair_order=t3.repair_order AND (operation='MFG' OR operation='ENG' OR operation='ENGH' OR operation='HOLD') GROUP BY t3.repair_order) as subt_days
FROM
  report_tables.cycle_time_days as t1
WHERE
  YEAR(t1.date_shipped)=2010
  AND t1.depot='REP'
GROUP BY
  repair_order
ORDER BY
  date_shipped;

I run into this with a lot of situations but I never know if it would be better to put the filter in the 开发者_如何学运维sub query, main query or both.


In this example, it would actually alter the query if you moved your WHERE clause to filter by REP into the subquery. So it wouldn't be about performance at that point, it would be about getting the same result set. In general, though, if you will get the same exact result set by moving a WHERE clause elsewhere in a complex query, it is better to do so at the most atomic level possible, ie, in the subquery. Then the subquery returns a smaller result set to the main query before the main query has to process it.


The answer to your question will vary depending on your schema, the complexity of your queries, the reliability of your data, etc. A general rule of thumb is to try to process the least amount of data possible, which generally means filtering it at the lowest level possible as well.

When you want to optimize a query the absolute number one place to start is to use the EXPLAIN output to see what optimizations the query parser was able to figure out and check to see what the weakest link is in the query plan. Resolve that, rinse, repeat.

You can also use explain's "extended" keyword to see the actual query it built to run which will reveal more about its usage of your criteria. In some cases, it will optimize away duplicate conditions between parent/subqueries. In other cases, it may push the conditions down from the parent in to the subquery. In some cases for (too) complex queries I've seen the it repeat the condition when it was only specified in the query once. Thankfully, you don't have to guess, mysql's explain plan will reveal all, albeit sometimes in cryptic ways.


I usually use a derived table as a "driver or aggregating" query then join that result back onto whatever table that i want to pull data from:

select
  t1.*,
  datediff(t1.date_shipped, t1.date_received) as htg_days,
  subt_days.total_days
from
 cycle_time_days as t1
inner join
( 
  -- aggregating/driver query  

  select 
   repair_order,
   sum(total_days) as total_days
  from 
   cycle_time_days
  where 
   year(date_shipped) = 2010 and depot = 'REP' and 
   operation in ('MFG','ENG','ENGH','HOLD') -- covering index on date, depot, op ???
  group by
   repair_order -- indexed ??
  having
   total_days > 14 -- added for demonstration purposes
  order by
   total_days desc limit 10

) as subt_days on t1.repair_order = subt_days.repair_order 
order by
  t1.date_shipped;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜