Optimizing IN clause
I am using Oracle 10g. Here is my query
select * from Entries
where RefKey in (select RefKey
from Entries
where KeyStat = 1)
and RefKey = Key;
Here RefKey, Key and KeyStat all are indexed. The table is partitioned on another column which is not being used here. In this query, i am selecting master key (if RefKey = Key then master) currently active (KeyStat = 1). Here is the execution plan of this query using SQLTools 1.21 RC3.
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 270 | 218K (1)| 00:43:37 | | |
| 1 | NESTED LOOPS SEMI | | 1 | 270 | 218K (1)| 00:43:37 | | |
| 2 | PARTITION RANGE ALL | | 1 | 262 | 218K (1)| 00:43:37 | 1 | 12 |
|* 3 | TABLE ACCESS FULL | ENTRIES | 1 | 262 | 218K (1)| 00:43:37 | 1 | 12 |
|* 4 | TABLE ACCESS BY GLOBAL INDEX ROWID| ENTRIES | 10M| 77M| 3 (0)| 00:00:01 | ROWID | ROWID |
|* 5 | INDEX RANGE SCAN | IND_ENTR_REFKEY| 1 | | 2 (0)| 00:00:01 | | |
----------------------------------------------------------------------------------------------------------------------
I am concerned about ID = 3 "TABLE ACCESS FULL". If all the columns being used in this query are indexed then why oracle is doing full table scan.
How this can be optimized? If i put some values in inner query, it returns much faster.
To explain why the sub-query is necessary: I am 开发者_高级运维selecting the whole batch having at least one active key. Refkey is not unique; for instance:
Key=1, RefKey=1, Stat=1
Key=2, RefKey=1, Stat=0
Key=3, RefKey=2, Stat=1
"I am concerned about ID = 3 "TABLE ACCESS FULL". If all the columns being used in this query are indexed then why oracle is doing full table scan."
The optimizer is ignoring the index on KEYSTAT. I would guess this is because KEYSTAT is not very selective (relatively few distinct values) and/or because those values are evenly distributed across the entire range of the ENTRIES table. If a query is going to hit pretty much every block in a table, FULL TABLE SCAN is the best path.
This guess is validated by the increased speed you get by filtering the sub-query.
As the others have suggested, refactoring your statement to remove the sub-query would be the best way to improve performance.
"There would be millions of entries having KeyStat= 0, only few in 1000s will have 1 so using index will be beneficial."
Skewed datadistribution is often a source of performance problems. You see, the thing is, the database doesn't know that KEYSTAT=1 is vastly more selective than KEYSTAT=0. Unless we tell it, which is why you might want to consider creating histograms when you gather statistics on that index. Find out more.
Note that histograms can cause problems as well as solve them, especially with queries which use bind variables instaead of literals. So benchmark in a sandpit before putting them into production.
Maybe I miss something but shouldn't this yield the same result?
select *
from Entries e
where e.KeyStat = 1
and e.RefKey = e.Key
Can you do a
EXPLAIN PLAN FOR select * from Entries
where RefKey in (select RefKey
from Entries
where KeyStat = 1)
and RefKey = Key;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
You should get an explain plan with a bunch of extra info at the end. Specifically, you are interested in that operation 3. Not only is it doing a full table scan, but it is only giving a rows/cardinality of 1. Really that means it doesn't think it is going to find anything (as it never gives a zero value there).
You can tell that it doesn't expect to ever find a row because although operation 4 talks about 10 million rows, it gives a cost of '3' and of that '2' is the index cost. It doesn't expect the scan from '3' to find anything so it never expects to plug a value into the index scan so it never is going to see 10M rows.
So I see two problems. Firstly, it doesn't use an index on KeyStat. Secondly, it is underestimating the number of rows returned by the match. the first might be caused by a datatype mismatch. Maybe KeyStat is a character value. Maybe it isn't the leading column in the index. Maybe the index has been disabled (during an ETL job ?).
The odd estimate is also suggestive. If the column was the leading column in an index, I'd expect the stats to have worked out that the max_val of the column is 1. If it thinks the max_val is zero, that would explain the "you ain't going to find anything" estimate but also that it didn't have an index to use to quickly find the max_val.
精彩评论