How to Optimize a aggregate SQL?
Assume a table schema like:
name amount_1, amount_2, cond_1, cond_2
the table have 500,000+rows.
How to optimize query like:
select
name
, sum(amount_1) as total_1
, sum(amount_2) as total_2
, sum(amount_1+amount_2) as total
from table_name
where cond_1 in ('a', 'b')
group by name
order by total de开发者_运维问答sc
limit 10;
Current the sql will take minutes to run.
You are likely to be missing indexes on these columns:
name
for the groupingcond_1
for the filtering
If you prefer to have a compound index only, I recommend making that an index on (cond_1, name)
(not the other way round). Why? cond_1
should be the first column in the compound index as the filter predicate can profit only from the columns on the left of an index.
This is explained here, for instance: http://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys
Indexing will help, plus you can rewrite the query slightly:
SELECT
s.*, s.total_1 + s.total_2 AS total
FROM (
SELECT
name
, SUM(amount_1) as total_1
, SUM(amount_2) as total_2
FROM table_name
WHERE cond_1 in ('a', 'b')
GROUP BY name ) s
ORDER BY total DESC
LIMIT 10;
This will prevent a sum of (total_1 + total_2) over all the rows, but instead reuse the totals calculated in total_1
and total_2
.
You can also try this variant, which may be slower or faster :-).
If you have an index on amount1 and amount2 and a substantial, but not huge fraction is 0 this may be much faster.
SELECT
s.*, s.total_1 + s.total_2 AS total
FROM (
SELECT
name
, SUM(amount_1) as total_1
, SUM(amount_2) as total_2
FROM table_name
WHERE cond_1 in ('a', 'b') AND (amount_1 <> 0 AND amount_2 <> 0)
GROUP BY name ) s
ORDER BY total DESC
LIMIT 10;
If you have lots of stuff in the IN
, it may be faster to do
WHERE cond_1 BETWEEN 'a' AND 'z'
This exchanges 26 OR tests for 2 AND tests.
Except indexes metioned by Lukas Eder, I suppose that one of sum
's is unnecessary, because if amount_1
and amount_2
are NOT NULL
then you will have equation
total = total_1 + total_2
So, if you need total
for ordering, you can loose total_2 and calculate it after query execution as (total - total_1)
Don't know if it save you much time, but looks like a little optimisation for me.
if this is an important part of your business consider OLAP, it's the faster way I know
精彩评论