Does a lot of repeated sum(x) in different cols make a Select slower?
I have a really big table with dozens of columns and lots of rows. Lets call this table FT. Everyday I run a script that reads data from the FT table, perform some calculations, and update a smaller table (table FA) that I use 开发者_如何学Pythonto generate reports.
The query that update FA is something like:
INSERT INTO FA (A, B, C)
(SELECT sum(X), sum(x) * sum(y), sum(x) + sum(z)) group by..
As I use sum(x) a lot of times, will it be faster if I create a temporary table with sum(x), sum(y) and sum(z) and use it to update my FA table?
every db i know has this kind of thign optimized so the values are calculated only once.
if you're not certain look at the execution plan and the reads for the current query and your changed to temp table query.
As a general rule of thumb, the time taken to retreive the data from disk is the slowest operation a database does (particularly on a large table)
I would expect relatively straight-forward arithmetic operations such as these to be negligable in comparison.
Benchmark your query against:
insert into fa (a, b, c)
select sum_x, sum_x * sum_y, sum_x * sum_z
from (select sum(x) as sum_x, sum(y) as sum_y, sum(z) as sum_z
from my_table
group by my_grouping_columns)
My strong suspicion is that Oracle's got to build the intermediate set first regardless -- the sums as grouped by - and then transform that into the final result set, regardless.
It will definitely not be easier or faster to force Oracle to materialize the intermediate result set into a global temporary table; you're adding direct path I/O without having a good reason to do so. That said, if the intermediate result set is expensive to build and used in multiple inserts, it may be worthwhile to materialize it into a temporary table.
Considering that you have tagged this post with data-warehouse
and datamart
, I can only assume that your FT table is some kind of a fact and that the query looks something like:
select
CalendarMonth
, sum(x) as Tot_1
, sum(x) * sum(y) as Tot_2
, sum(x) + sum(z) as Tot_3
from FT as f
join dimDate as d on d.DateKey = f.DateKey
join dimUser as u on u.UserKey = f.UserKey
join dimProduct as p on p.ProductKey = f.ProductKey
where CalendarYear between 2008 and 2010
and Country = 'United States'
and ProductCategory = 'Cool Gadget'
and UserGender = 'Female'
group by CalendarMonth ;
Which is exactly how an aggregation over measures in a fact table should look like.
Now, for reporting purposes, it seems that you have an aggregation table (FA) to speed-up reports. I can only guess that the warehouse is loaded over night and that your query prepares the aggregation sometimes in the morning, before business hours, so it runs once per day -- or at least is supposed to. If this query takes too long to run, consider adding few key fields to your aggregation table (FA) -- usually DateKey -- then update the FA table periodically.
For example, if you have 10,000 sales per day than the above query sums ~ 300,000 rows for each month. If the aggregation table is aggregated per day, than it takes sum of 10,000 rows once per day to update the table, and sum of only 30 rows per month for a report.
To summarize, in order to speed-up fact aggregation queries focus on number of rows that are aggregated -- not on aggregate functions. Also, make sure that dimension tables have indexes on columns mentioned in the WHERE clause of the query.
Admittedly, I may have assumed too much here, so this may or may not be helpful.
精彩评论