How to Optimize a aggregate SQL?

2023-03-25 15:48 问答作者：

Assume a table schema like:

name amount_1, amount_2, cond_1, cond_2

the table have 500,000+rows.

How to optimize query like:

select 
  name
  , sum(amount_1) as total_1
  , sum(amount_2) as total_2
  , sum(amount_1+amount_2) as total 
from table_name 
where cond_1 in ('a', 'b') 
group by name 
order by total de开发者_运维问答sc 
limit 10;

Current the sql will take minutes to run.

You are likely to be missing indexes on these columns:

name for the grouping
cond_1 for the filtering

If you prefer to have a compound index only, I recommend making that an index on (cond_1, name) (not the other way round). Why? cond_1 should be the first column in the compound index as the filter predicate can profit only from the columns on the left of an index.

This is explained here, for instance: http://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys

Indexing will help, plus you can rewrite the query slightly:

SELECT
  s.*, s.total_1 + s.total_2 AS total
FROM (
  SELECT
    name
    , SUM(amount_1) as total_1
    , SUM(amount_2) as total_2
  FROM table_name 
  WHERE cond_1 in ('a', 'b') 
  GROUP BY name ) s
ORDER BY total DESC 
LIMIT 10;

This will prevent a sum of (total_1 + total_2) over all the rows, but instead reuse the totals calculated in total_1 and total_2.

You can also try this variant, which may be slower or faster :-).
If you have an index on amount1 and amount2 and a substantial, but not huge fraction is 0 this may be much faster.

SELECT
  s.*, s.total_1 + s.total_2 AS total
FROM (
  SELECT
    name
    , SUM(amount_1) as total_1
    , SUM(amount_2) as total_2
  FROM table_name 
  WHERE cond_1 in ('a', 'b') AND (amount_1 <> 0 AND amount_2 <> 0)
  GROUP BY name ) s
ORDER BY total DESC 
LIMIT 10;

If you have lots of stuff in the IN, it may be faster to do

WHERE cond_1 BETWEEN 'a' AND 'z'

This exchanges 26 OR tests for 2 AND tests.

Except indexes metioned by Lukas Eder, I suppose that one of sum's is unnecessary, because if amount_1 and amount_2 are NOT NULL then you will have equation

total = total_1 + total_2

So, if you need total for ordering, you can loose total_2 and calculate it after query execution as (total - total_1)

Don't know if it save you much time, but looks like a little optimisation for me.

if this is an important part of your business consider OLAP, it's the faster way I know

继续阅读：query-optimization

How to Optimize a aggregate SQL?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？