Oracle query using more temp space than usual
One of our customers informs us that a process cannot complete because they run out of temp space (20GB). The process is part of standard software and we usually need no more than 300MB of temp space.
We started to monitor the temp space (Metalink note: 364417.1) and found the offending query. We also ran the process with sql trace on, both on our clients database and on our database. (Both Oracle 10.2.0.5, exact same version of our application, exact same data)
Here is the difference:
Trace from our database:
SELECT OBC1.BCT_ID BCT_ID
FROM
NGG_OBJECTBASISCOMPONENT OBC1 ,NGG_OBJECTBASISCOMPONENT OBC2 ,NGG_OBJECT
OBJ1 ,NGG_OBJECT OBJ2 ,NGG_LAAGBASISCOMPONENT LBC1 ,NGG_LAAGBASISCOMPONENT
LBC2 WHERE OBC1.BCT_ID = OBC2.BCT_ID AND OBC1.OBJ_ID = OBJ1.ID AND
OBC2.OBJ_ID = OBJ2.ID AND OBJ1.ODE_ID IS NULL AND OBJ2.ODE_ID IS NULL AND
OBC1.LBC_ID = LBC1.ID AND OBC2.LBC_ID = LBC2.ID AND OBJ1.ID > OBJ2.ID AND
OBJ1.TRE_ID_V IS NULL AND OBJ2.TRE_ID_V IS NULL AND LBC1.LDE_ID = :B2 AND
LBC1.LDE_ID = LBC2.LDE_ID AND OBJ1.TRE_ID_O = :B1 AND LBC1.FOUT = 0
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 24 0.01 0.00 0 0 0 0
Execute 26 0.04 0.04 0 0 0 0
Fetch 26 0.15 0.14 0 11932 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 76 0.21 0.18 0 11932 0 0
Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 63 (recursive depth: 2)
Rows Row Source Operation
------- ---------------------------------------------------
0 NESTED LOOPS (cr=3 pr=0 pw=0 time=181 us)
0 NESTED LOOPS (cr=3 pr=0 pw=0 time=155 us)
0 NESTED LOOPS (cr=3 pr=0 pw=0 time=133 us)
0 NESTED LOOPS (cr=3 pr=0 pw=0 time=110 us)
0 NESTED LOOPS (cr=3 pr=0 pw=0 time=91 us)
0 TABLE ACCESS BY INDEX ROWID NGG_OBJECT (cr=3 pr=0 pw=0 time=65 us)
0 INDEX RANGE SCAN NGG_OBJ_TRE_FK_O_I (cr=3 pr=0 pw=0 time=40 us)(object id 49579)
0 TABLE ACCESS BY INDEX ROWID NGG_OBJECTBASISCOMPONENT (cr=0 pr=0 pw=0 time=0 us)
0 INDEX RANGE SCAN NGG_OBC_OBJ_FK_I (cr=0 pr=0 pw=0 time=0 us)(object id 49586)
0 TABLE ACCESS BY INDEX ROWID NGG_OBJECTBASISCOMPONENT (cr=0 pr=0 pw=0 time=0 us)
0 INDEX RANGE SCAN NGG_OBC_BCT_FK_I (cr=0 pr=0 pw=0 time=0 us)(object id 49585)
0 TABLE ACCESS BY INDEX ROWID NGG_OBJECT (cr=0 pr=0 pw=0 time=0 us)
0 INDEX UNIQUE SCAN NGG_OBJ_PK (cr=0 pr=0 pw=0 time=0 us)(object id 49596)
0 TABLE ACCESS BY INDEX ROWID NGG_LAAGBASISCOMPONENT (cr=0 pr=0 pw=0 time=0 us)
0 INDEX UNIQUE SCAN NGG_LBC_PK (cr=0 pr=0 pw=0 time=0 us)(object id 49591)
0 TABLE ACCESS BY INDEX ROWID NGG_LAAGBASISCOMPONENT (cr=0 pr=0 pw=0 time=0 us)
0 INDEX UNIQUE SCAN NGG_LBC_PK (cr=0 pr=0 pw=0 time=0 us)(object id 49591)
********************************************************************************
The trace from our clients database is the same except for the amount of rows fetched:
********************************************************************************
SELECT OBC1.BCT_ID BCT_ID
FROM
NGG_OBJECTBASISCOMPONENT OBC1 , NGG_OBJECTBASISCOMPONENT OBC2 , NGG_OBJECT
OBJ1 , NGG_OBJECT OBJ2 , NGG_LAAGBASISCOMPONENT LBC1 ,
NGG_LAAGBASISCOMPONENT LBC2 WHERE OBC1.BCT_ID = OBC2.BCT_ID AND
OBC1.OBJ_ID = OBJ1.ID AND OBC2.OBJ_ID = OBJ2.ID AND OBJ1.ODE_ID IS NULL
AND OBJ2.ODE_ID IS NULL AND OBC1.LBC_ID = LBC1.ID AND OBC2.LBC_ID =
LBC2.ID AND OBJ1.ID > OBJ2.ID AND OBJ1.TRE_ID_V IS NULL AND
OBJ2.TRE_ID_V IS NULL AND LBC1.LDE_ID = :b1 AND LBC1.LDE_ID =
LBC2.LDE_ID AND OBJ1.TRE_ID_O = :b2 AND LBC1.FOUT = 0
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 24 0.00 0.00 0 0 0 0
Execute 26 0.04 0.04 0 0 0 0
Fetch 26 2414.90 2521.04 258210 624771631 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 76 2414.95 2521.09 258210 624771631 0 0
Misses in library cache during parse: 2
Misses in library cache during execute: 2
Optimizer mode: ALL_ROWS
Parsing user id: 64 (recursive depth: 2)
Rows Row Source Operation
------- ---------------------------------------------------
0 NESTED LOOPS (cr=3 pr=0 pw=0 time=51 us)
0 NESTED LOOPS (cr=3 pr=0 pw=0 time=47 us)
0 NESTED LOOPS (cr=3 pr=0 pw=0 time=43 us)
0 NESTED LOOPS (cr=3 pr=0 pw=0 time=42 us)
0 NESTED LOOPS (cr=3 pr=0 pw=0 time=38 us)
0 TABLE ACCESS BY INDEX ROWID NGG_OBJECT (cr=3 pr=0 pw=0 time=35 us)
0 INDEX RANGE SCAN NGG_OBJ_TRE_FK_O_I (cr=3 pr=0 pw=0 time=31 us)(object id 49947)
0 TABLE ACCESS BY INDEX ROWID NGG_OBJECTBASISCOMPONENT (cr=0 pr=0 pw=0 time=0 us)
0 INDEX RANGE SCAN NGG_OBC_OBJ_FK_I (cr=0 pr=0 pw=0 time=0 us)(object id 49954)
0 TABLE ACCESS BY INDEX ROWID NGG_OBJECTBASISCOMPONENT (cr=0 pr=0 pw=0 time=0 us)
0 INDEX RANGE SCAN NGG_OBC_BCT_FK_I (cr=0 pr=0 pw=0 time=0 us)(object id 49953)
0 TABLE ACCESS BY INDEX ROWID NGG_OBJECT (cr=0 pr=0 pw=0 time=0 us)
0 INDEX UNIQUE SCAN NGG_OBJ_PK (cr=0 pr=0 pw=0 time=0 us)(object id 49964)
0 TABLE ACCESS BY INDEX ROWID NGG_LAAGBASISCOMPONENT (cr=0 pr=0 pw=0 time=0 us)
0 INDEX UNIQUE SCAN NGG_LBC_PK (cr=0 pr=0 pw=0 time=0 us)(object id 49959)
0 TABLE ACCESS BY INDEX ROWID NGG_LAAGBASISCOMPONENT (cr=0 pr=0 pw=0 time=0 us)
0 INDEX UNIQUE SCAN NGG_LBC_PK (cr=0 pr=0 pw=0 time=0 us)(object id 49959)
Elapsed times include waiting on fo开发者_运维技巧llowing events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
direct path write temp 17522 0.00 0.06
direct path read temp 17214 0.00 0.07
latch: cache buffers chains 1 0.00 0.00
Is this query producing a cartesian product? Why only on this particular database instance?
What more can I do to figure out what is going on?
I wonder if that plan is what an EXPLAIN option in TKPROF generates rather than the actual execution plan as it runs. Try capturing the query as it's run (or right as it fails) through querying AWR data or v$sql / v$sql_plan - see this thread on asktom for more.
(I say this because there's nothing in the plan as presented which would cause temp space to be used, assuming none of those are global temp tables)
Firstly, looking at this line "Fetch 26 2414.90 2521.04
" you can see almost all of the time is spent in CPU. This is backed up by the listed waits being just 0.06 and 0.07. Totally insignificant.
So this query plan is NOTHING to do with temp, and all to do with CPU. Broadly, the query plan seems to follow the following join order.
SELECT OBC1.BCT_ID BCT_ID
FROM NGG_OBJECT OBJ1
join NGG_OBJECTBASISCOMPONENT OBC1 on (OBC1.OBJ_ID = OBJ1.ID)
join NGG_OBJECTBASISCOMPONENT OBC2 on (OBC1.BCT_ID = OBC2.BCT_ID )
join NGG_OBJECT OBJ2 on (OBC2.OBJ_ID = OBJ2.ID )
join NGG_LAAGBASISCOMPONENT LBC1 on (OBC1.LBC_ID = LBC1.ID)
join NGG_LAAGBASISCOMPONENT LBC2 on (OBC2.LBC_ID = LBC2.ID)
WHERE OBJ1.TRE_ID_O = :b2
AND LBC1.LDE_ID = LBC2.LDE_ID
AND OBJ1.ID > OBJ2.ID
AND OBJ1.ODE_ID IS NULL
AND OBJ2.ODE_ID IS NULL
AND OBJ1.TRE_ID_V IS NULL
AND OBJ2.TRE_ID_V IS NULL
AND LBC1.LDE_ID = :b1
AND LBC1.FOUT = 0
Given the lack of waits for disk reads, I suspect you've got all the indexes in memory, and probably the tables too. I suspect a lot of rows are matching that first TRE_ID_O filter and that each match is then hitting the NGG_OBJECT table and possibly getting excluded by either of the OBJ1.ODE_ID IS NULL / OBJ1.TRE_ID_V IS NULL conditions (or both).
What are the table volumes ? Are there in the order of 10 million rows matching the TRE_ID_O value.
I wouldn't be surprised if there was an alternative plan using hash joins (which would use temp)
精彩评论