开发者

How to Optimize a aggregate SQL?

Assume a table schema like:

name amount_1, amount_2, cond_1, cond_2

the table have 500,000+rows.

How to optimize query like:

select 
  name
  , sum(amount_1) as total_1
  , sum(amount_2) as total_2
  , sum(amount_1+amount_2) as total 
from table_name 
where cond_1 in ('a', 'b') 
group by name 
order by total de开发者_运维问答sc 
limit 10;

Current the sql will take minutes to run.


You are likely to be missing indexes on these columns:

  • name for the grouping
  • cond_1 for the filtering

If you prefer to have a compound index only, I recommend making that an index on (cond_1, name) (not the other way round). Why? cond_1 should be the first column in the compound index as the filter predicate can profit only from the columns on the left of an index.

This is explained here, for instance: http://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys


Indexing will help, plus you can rewrite the query slightly:

SELECT
  s.*, s.total_1 + s.total_2 AS total
FROM (
  SELECT
    name
    , SUM(amount_1) as total_1
    , SUM(amount_2) as total_2
  FROM table_name 
  WHERE cond_1 in ('a', 'b') 
  GROUP BY name ) s
ORDER BY total DESC 
LIMIT 10;

This will prevent a sum of (total_1 + total_2) over all the rows, but instead reuse the totals calculated in total_1 and total_2.

You can also try this variant, which may be slower or faster :-).
If you have an index on amount1 and amount2 and a substantial, but not huge fraction is 0 this may be much faster.

SELECT
  s.*, s.total_1 + s.total_2 AS total
FROM (
  SELECT
    name
    , SUM(amount_1) as total_1
    , SUM(amount_2) as total_2
  FROM table_name 
  WHERE cond_1 in ('a', 'b') AND (amount_1 <> 0 AND amount_2 <> 0)
  GROUP BY name ) s
ORDER BY total DESC 
LIMIT 10;

If you have lots of stuff in the IN, it may be faster to do

WHERE cond_1 BETWEEN 'a' AND 'z'

This exchanges 26 OR tests for 2 AND tests.


Except indexes metioned by Lukas Eder, I suppose that one of sum's is unnecessary, because if amount_1 and amount_2 are NOT NULL then you will have equation

total = total_1 + total_2

So, if you need total for ordering, you can loose total_2 and calculate it after query execution as (total - total_1)

Don't know if it save you much time, but looks like a little optimisation for me.


if this is an important part of your business consider OLAP, it's the faster way I know

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜