开发者

oracle: tune correlated subqueries in select clause

On Oracle 10gR2, given the开发者_如何学Go following query, it takes forever to run. This is because of all the correlated subqueries in the select clause. There has got to be a better way. I'm thinking that rewriting the correlated subqueries as joins would do the trick, or building the query in such a way as to only need one correlated subquery, or possibly using some type of analytic function, but so far it is beyond me. Any help would be much appreciated.

Here is the query:

SELECT COL_1,
       TAB_1.COL_2                                 AS REPORT,
       (SELECT COL_3
        FROM   TAB_2
        WHERE  TAB_2.COL_1 = TAB_1.COL_1)          AS DEPOT,
       (SELECT COUNT(DISTINCT( TAB_3.COL_4 ))
        FROM   TAB_3
        WHERE  TAB_3.COL_9 = TAB_1.COL_1
               AND TAB_1.COL_2 = TAB_3.COL_6
               AND TAB_3.COL_5 IS NULL
               AND TAB_3.COL_8 = 'Parts Shortage') AS P_SHORTAGES,
       (SELECT COUNT(DISTINCT( Trim(COL_10) ))
        FROM   TAB_3
        WHERE  TAB_3.COL_9 = TAB_1.COL_1
               AND TAB_1.COL_2 = TAB_3.COL_6
               AND TAB_3.COL_5 IS NULL
               AND TAB_3.COL_8 = 'Parts Shortage') AS PARTS_AFFECTED,
       (SELECT COUNT(TAB_3.COL_7)
        FROM   TAB_3
        WHERE  TAB_3.COL_7 = 1
               AND TAB_3.COL_9 = TAB_1.COL_1
               AND TAB_1.COL_2 = TAB_3.COL_6
               AND TAB_3.COL_5 IS NULL
               AND TAB_3.COL_8 = 'Parts Shortage') AS PARTS_CATEGORY1,
       (SELECT COUNT(TAB_3.COL_4)
        FROM   TAB_3
        WHERE  TAB_3.COL_9 = TAB_1.COL_1
               AND TAB_3.COL_6 = TAB_1.COL_2
               AND TAB_3.COL_5 IS NULL
               AND TAB_3.COL_8 = 'Unsrv Asset')    AS U_SHORTAGES,
       (SELECT COUNT(TAB_3.COL_10)
        FROM   TAB_3
        WHERE  TAB_3.COL_9 = TAB_1.COL_1
               AND TAB_3.COL_6 = TAB_1.COL_2
               AND TAB_3.COL_5 IS NULL
               AND TAB_3.COL_8 = 'Unsrv Asset')    AS U_AFFECTED,
       (SELECT COUNT(DISTINCT( Trim(TAB_3.COL_7) ))
        FROM   TAB_3
        WHERE  TAB_3.COL_7 = 1
               AND TAB_3.COL_9 = TAB_1.COL_1
               AND TAB_3.COL_6 = TAB_1.COL_2
               AND TAB_3.COL_5 IS NULL
               AND TAB_3.COL_8 = 'Unsrv Asset')    AS UNSRV_CAT1,
       To_char(TAB_1.COL_11, 'MM/DD/YY')           AS REPORT_DATE
FROM   TAB_1;  

For me, to make things less complicated I tried to rewrite just the following, but I'm still at a loss:

    SELECT COL_1,
       (SELECT COUNT(DISTINCT( TAB_3.COL_4 ))
        FROM   TAB_3
        WHERE  TAB_3.COL_9 = TAB_1.COL_1
               AND TAB_1.COL_2 = TAB_3.COL_6
               AND TAB_3.COL_5 IS NULL
               AND TAB_3.COL_8 = 'Parts Shortage') AS P_SHORTAGES
FROM   TAB_1;  


This is quite a loaded question.. There is not enough info here (e.g. execution plan, record counts, available indexes, pysical location of data in table, etc) to be able to tune the query properly. The best anyone can do is give a guess. That being said.. here is my best guess:

select distinct
    tab_1.col_1,
    tab_1.col_2 AS report,
    tab_2.col_3 AS depot,
    COUNT (DISTINCT (case when tab_3.col_8 = 'Parts Shortage' then tab_3.col_4 end)) over (partition by tab_1.col_1, tab_1.col_2) AS p_shortages,
    COUNT (DISTINCT (case when tab_3.col_8 = 'Parts Shortage' then TRIM (tab_3.col_10) end)) over (partition by tab_1.col_1, tab_1.col_2)  AS parts_affected,
    COUNT (case when tab_3.col_8 = 'Parts Shortage'  and tab_3.col_7 = 1 then tab_3.col_7 end) over (partition by tab_1.col_1, tab_1.col_2)  AS parts_category1,
    COUNT (case when tab_3.col_8 = 'Unsrv Asset' then tab_3.col_4 end) over (partition by tab_1.col_1, tab_1.col_2)  AS u_shortages,
    COUNT (case when tab_3.col_8 = 'Unsrv Asset' then tab_3.col_10 end) over (partition by tab_1.col_1, tab_1.col_2)  AS u_affected,
    COUNT (DISTINCT (case when tab_3.col_8 = 'Unsrv Asset' and tab_3.col_7 = 1 then TRIM(tab_3.col_7) end)) over (partition by tab_1.col_1, tab_1.col_2)  AS unsrv_cat1,
    TO_CHAR(tab_1.col_11, 'MM/DD/YY') AS report_date
from tab_1
    left outer join tab_2
        on tab_2.col_1 = tab_1.col_1
    left outer join tab_3
        on     tab_3.col_9 = tab_1.col_1
        AND tab_3.col_6 = tab_1.col_2 
        AND tab_3.col_5 IS NULL;

After looking closer at the query above to explain what I did, I modified it further. This was because I was grouping by TO_CHAR(tab_1.col_11, 'MM/DD/YY'), but saw this wasn't part of the correlated subquery criteria, so it had to change (unless the combination of tab_1.col_1 and col_2 are unique by themselves).

So now to try to explain this:

Basically what is happening in the original query is that for every row of tab_1, you are running several queries on tab_3. So instead of that, I changed it to an outer join on tab_3. Since I don't know the data, this had to be an outer join because a correlated subquery won't eliminate any rows from the final output, where an inner join might. I just joined to tab_3 once, since all of the subqueries were joining tab_3 back to tab_1 using the same fields. I just moved the specific logic for each subquery into a case statement inside the count so that the case would return null (and thus not get counted) if the criteria wasn't met. This newest version uses aggregate functions to get my calculations at the correct level (tab_1 col_1 and col_2 which is what the original subquery was joining based on). Since converting the subquery to an outer join could have possibly created more rows (if there isn't a 1-to-1 match between the tables) I added the distinct to only get one row for each row in tab_1. All rows for each row in tab_1 should be the same. If there were already duplicates in tab_1, you will have to do something a little more in-depth to get keep the number of records the same.

Hopefully this makes sense. If you have questions, feel free to ask and I will do my best to explain further.

--------------------------- More explanation

@shawno: Kind of, but I don't really think of it much as a loop.. Using your simplified example, let's pretend your tables look like this:

TAB_1:

col_1           col_2
--------        ---------
A               B
C               D

TAB_3:

col_9           col_6          col_4
--------        ---------      ---------
A               B              X
A               B              Y
A               B              Z
C               D              X
C               D              X

Using the subquery method, you are looking at each row of tab_1 and then running a query against tab_3. So you would do:

for row col_1 = A, col_2 = B, run a select count(distinct(col_4)) on tab_3 where col_9 = A and col_6 = B. This returns the value 3, which is the value the subquery returns.

for row col_1 = C, col_2 = D, run a select count(distinct(col_4)) on tab_3 where col_9 = C and col_6 = D. This returns the value 1, which is the value the subquery returns.

Using the join method, you first join the tables giving you data like:

col_1           col_2          col_9           col_6          col_4
--------        ---------      --------        ---------      ---------
A               B              A               B              X
A               B              A               B              Y
A               B              A               B              Z
C               D              C               D              X
C               D              C               D              X

So now you just have to make a query based on that data, doing a Count(distinct(col_4)) for each value of col_1, col_2. If you know what your data looks like, you can create a more efficient query, but the idea remains the same.

Hope this makes it a little more clear!


The most efficient way to tune corelated subqueries is to have proper index on joining columns. Hence, I would like to suggest you to have proper indexes on the joining columns.

A composite index on Tab3 for col9, col6.

A composite index on Tab1 for col1 and col2.

An index on Tab3.col5 (if it has most nulls, else not create this one)

A function based index on tab3.col8 create index <index_name> on tab3(upper(col8))

and change the query to use UPPER function say..UPPER(TAB_3.COL_8) = 'PARTS SHORTAGE'...

Remember to execute this statement before executing the query otherwise function based index is not useful.

alter session set query_rewrite_enabled=true (this makes optimiser to use FBI).

Then execute the query. Check the explain plan if the query is using all indexes properly... else add /*+Index() */ hints to use the indexes properly by the query.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜