weird execution plan of a SQL Server query
Context: SQL Server 2008. There are 2 tables to inner join. The fact table, which has 40 million rows, contains the patient key and the medications administered and other facts. There is a unique index (nonclustered) on medication key and patient key combined in that order. The dimension table is the medication list (70 rows). The join is to get the medication code (business code) based on medication key (surrogate key). Query:
SELECT a.PKey, a.SomeFact, b.MCode
FROM tblFact a
JOIN tblDIM b ON a.MKey = b.MKey
All the columns returned are integer. The above query runs in 7 minutes and its execution plan shows the index on (MKey,PKey) is used. The index was rebuilt right before the run. When I disabled the index on the fact table (or copy data to a new table with same structure but without index), the same query takes only 1:40 minutes.
IO Statistics are also stunning.
With index: Table 'tblFACT'. Scan count 70, logical reads 190296338, physical reads 685138, read-ahead reads 98713
Without index: Table 'tblFACT_copy'. Scan count 17, logical reads 468891, physical reads 0, r开发者_JAVA百科ead-ahead reads 419768
Question: why does it try to use the index and head down the inefficient path?
You need to add SomeFact
as an INCLUDE on the tblFact
index to make it covering.
Currently, the table will be accessed twice: once for the index and then again for a lookup to get SomeFact
either as a RID or key lookup (depends on if there is a clustered index)
This doesn't apply to tblDIM
because I assume that MKey
is the clustered index which makes it covering implicitly
In rare cases, the database chooses an incorrect execution plan. In this case, the index is used for the join, but since all data is fetched from both tables, it would be faster to just scan the whole table. The indexed version will be much faster if you add a WHERE clause to the query, because without indexes it will still need to scan the whole table, instead of grabbing just the handful of records it needs.
There may be directives to encourage the database not to use indexes or use different indexes, but I don't know SQL server that well.
Are your statistics up to date? Check with:
SELECT object_name = Object_Name(ind.object_id)
, IndexName = ind.name
, StatisticsDate = STATS_DATE(ind.object_id, ind.index_id)
FROM SYS.INDEXES ind
order by
STATS_DATE(ind.object_id, ind.index_id) desc
Update with:
exec sp_updatestats;
精彩评论