开发者

Oracle 11g: Index not used in "select distinct"-query

My question concerns Oracle 11g and the use of indexes in SQL queries.

In my database, there is a table that is structured as followed:

Table tab (
  rowid NUMBER(11),
  unique_id_string VARCHAR2(2000),
  year NUMBER(4),
  dynamic_col_1 NUMBER(11),
  dynamic_col_1_text NVARCHAR2(2000)
 ) TABLESPACE tabspace_data;

I have created two indexes:

CREATE INDEX Index_dyn_col1 ON tab (dynamic_col_1, dynamic_col_1_text) TABLESPACE tabspace_index;
CREATE INDEX Index_unique_id_year ON tab (unique_id_string, year) TABLESPACE tabspace_index;

The table contains around 1 to 2 million records. I extract the data from it by executing the following SQL command:

SELECT distinct
 "sub_select"."dynamic_col_1" "AS_dynamic_col_1","sub_select"."dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM 
(
    SELECT "tab".*  FROM "tab"
    where "tab".year = 2011
) "sub_select"

Unfortunately, the query needs around 1 hour to execute, although I created the both indexes described above. The explain 开发者_运维问答plan shows that Oracle uses a "Table Full Access", i.e. a full table scan. Why is the index not used?

As an experiment, I tested the following SQL command:

SELECT DISTINCT
 "dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
 FROM "tab"

Even in this case, the index is not used and a full table scan is performed.

In my real database, the table contains more indexed columns like "dynamic_col_1" and "dynamic_col_1_text". The whole index file has a size of about 50 GB.

A few more informations:

  • The database is Oracle 11g installed on my local computer.
  • I use Windows 7 Enterprise 64bit.
  • The whole index is split over 3 dbf files with about 50GB size.

I would really be glad, if someone could tell me how to make Oracle use the index in the first query. Because the first query is used by another program to extract the data from the database, it can hardly be changed. So it would be good to tweak the table instead.

Thanks in advance.

[01.10.2011: UPDATE]

I think I've found the solution for the problem. Both columns dynamic_col_1 and dynamic_col_1_text are nullable. After altering the table to prohibit "NULL"-values in both columns and adding a new index solely for the column year, Oracle performs a Fast Index Scan. The advantage is that the query takes now about 5 seconds to execute and not 1 hour as before.


Are you sure that an index access would be faster than a full table scan? As a very rough estimate, full table scans are 20 times faster than reading an index. If tab has more than 5% of the data in 2011 it's not surprising that Oracle would use a full table scan. And as @Dan and @Ollie mentioned, with year as the second column this will make the index even slower.

If the index really is faster, than the issue is probably bad statistics. There are hundreds of ways the statistics could be bad. Very briefly, here's what I'd look at first:

  1. Run an explain plan with and without and index hint. Are the cardinalities off by 10x or more? Are the times off by 10x or more?
  2. If the cardinality is off, make sure there are up to date stats on the table and index and you're using a reasonable ESTIMATE_PERCENT (DBMS_STATS.AUTO_SAMPLE_SIZE is almost always the best for 11g).
  3. If the time is off, check your workload statistics.
  4. Are you using parallelism? Oracle always assumes a near linear improvement for parallelism, but on a desktop with one hard drive you probably won't see any improvement at all.

Also, this isn't really relevant to your problem, but you may want to avoid using quoted identifiers. Once you use them you have to use them everywhere, and it generally makes your tables and queries painful to work with.


Your index should be:

CREATE INDEX Index_year 
ON tab (year) 
TABLESPACE tabspace_index;

Also, your query could just be:

SELECT DISTINCT
       dynamic_col_1 "AS_dynamic_col_1",
       dynamic_col_1_text "AS_dynamic_col_1_text"
  FROM tab
 WHERE year = 2011;

If your index was created solely for this query though, you could create it including the two fetched columns as well, then the optimiser would not have to go to the table for the query data, it could retrieve it directly from the index making your query more efficient again.

Hope it helps...


I don't have an Oracle instance on hand so this is somewhat guesswork, but my inclination is to say it's because you have the compound index in the wrong order. If you had year as the first column in the index it might use it.


Your second test query:

SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"

would not use the index because you have no WHERE clause, so you're asking Oracle to read every row in the table. In that situation the full table scan is the faster access method.

Also, as other posters have mentioned, your index on YEAR has it in the second column. Oracle can use this index by performing a skip scan, but there is a performance hit for doing so, and depending on the size of your table Oracle may just decide to use the FTS again.


I don't know if it's relevant, but I tested the following query:

SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
WHERE "dynamic_col_1" = 123 AND "dynamic_col_1_text" = 'abc'

The explain plan for that query show that Oracle uses an index scan in this scenario.

The columns dynamic_col_1 and dynamic_col_1_text are nullable. Does this have an effect on the usage of the index?

01.10.2011: UPDATE]

I think I've found the solution for the problem. Both columns dynamic_col_1 and dynamic_col_1_text are nullable. After altering the table to prohibit "NULL"-values in both columns and adding a new index solely for the column year, Oracle performs a Fast Index Scan. The advantage is that the query takes now about 5 seconds to execute and not 1 hour as before.


Try this:

1) Create an index on year field (see Ollie answer).

2) And then use this query:

SELECT DISTINCT
dynamic_col_1
,dynamic_col_1_text
FROM tab 
WHERE ID (SELECT ID FROM tab WHERE year=2011)

or

SELECT DISTINCT
dynamic_col_1
,dynamic_col_1_text
FROM tab 
WHERE ID (SELECT ID FROM tab WHERE year=2011)
GROUP BY dynamic_col_1, dynamic_col_1_text

Maybe it will help you.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜