Getting rid of full index scan

2023-01-05 18:34 问答作者：

The following query performs badly because of a full non-clustered index scan of 6.5 million records in P4FileReleases followed by a hash join. I'm looking for possible reasons the optimizer picks a scan over a seek.

SELECT p4f.FileReleaseID 
   FROM P4FileReleases p4f
   INNER JOIN AnalyzedFileView af 
      ON p4f.FileRelease = (af.path+'#'+cast(af.revision as varchar))  
   WHERE (af.tracked_change_id = 1)

From what I can tell, I see no reason for the optimizer to pick a scan of P4FileReleases. The WHERE clause limits the size of the right dataset to about 1K of records and the optimizer should know it (see the histogram below).

If fact, if I take the view data and throw it into a heap table (same structure as the indexed view), then the query is performed with an index seek on the larger table and an inner join loop instead of a hash join (and the total cost drops from 145 to around 1).

Any ideas on what might be throwing the optimizer off?

Details. Sql Server 2008 (v. 10.0.2757.0).

P4FileReleases table Holds 6.5 million records

CREATE TABLE [dbo].[P4FileReleases](
    [FileReleaseID] [int] IDENTITY(1,1) NOT NULL,
    [FileRelease] [varchar](254) NOT NULL,
    -- 5 more fields 
 CONSTRAINT [CIX_P4FileReleases_FileReleaseID_PK] PRIMARY KEY CLUSTERED 
(
    [FileReleaseID] ASC
),
CONSTRAINT [NCIX_P4FileReleases_FileRelease] UNIQUE N开发者_开发问答ONCLUSTERED 
(
    [FileRelease] ASC
)

AnalyzedFileView is an indexed view with statistics enabled and up-to-date.

It has four columns:

   key int (int, PK) - clustered index
   tracked_change_id (int, FK) - non-unique, non-clustered index (covering 'path', 'revision')
   path (nvarchar(1024), null) 
   revision (smallint, null)

tracked_change_id histogram:

1   0   1222    0   1
4   0   787     0   1
8   0   2754    0   1
12  0   254     0   1
13  0   34      0   1

Query Plan

  |--Parallelism(Gather Streams)
       |--Hash Match(Inner Join, HASH:([Expr1011])=([Expr1010]), RESIDUAL:([Expr1010]=[Expr1011]))
            |--Bitmap(HASH:([Expr1011]), DEFINE:([Bitmap1015]))
            |    |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([Expr1011]))
            |         |--Compute Scalar(DEFINE:([Expr1011]=([qpsitools].[dbo].[analyzed_file_view].[path]+N'#')+CONVERT_IMPLICIT(nvarchar(30),CONVERT(varchar(30),[qpsitools].[dbo].[analyzed_file_view].[revision],0),0)))
            |              |--Index Seek(OBJECT:([qpsitools].[dbo].[analyzed_file_view].[tracked_change_id]), SEEK:([qpsitools].[dbo].[analyzed_file_view].[tracked_change_id]=(1)) ORDERED FORWARD)
            |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([Expr1010]), WHERE:(PROBE([Bitmap1015],[Expr1010])))
                 |--Compute Scalar(DEFINE:([Expr1010]=CONVERT_IMPLICIT(nvarchar(254),[Blueprint].[dbo].[P4FileReleases].[FileRelease] as [p4f].[FileRelease],0)))
                      |--Index Scan(OBJECT:([Blueprint].[dbo].[P4FileReleases].[NCIX_P4FileReleases_FileRelease] AS [p4f]))

You are joining varchar column p4f.FileRelease with an nvarchar column (af.path). Since the data types don't match, SQL has to convert one's type to the other's (and of course it can't go from nvarchar to varchar). In converting af.path to nvarchar, it loses the ability to use the index to lookup/filter those values, resulting in the need to scan and convert all possible rows.

The best solution is to store the data as matching data types (change column p4f.FileRelase to nvarchar, or af.path to varchar). Since no one ever gets to modify existing database structures, a work-around might be to explicitly cast af.path to varchar in the query. Test it and see... though of course you can't do this if the data truly requires double-byte formatting.

your problem is not the WHERE but the JOIN, you are getting an implicit conversion and a scan on the JOIN, on the WHERE condition you are getting a SEEK

ON p4f.FileRelease = (af.path+'#'+cast(af.revision as varchar))

Parallelism could also be a problem, try adding MAXDOP=1

Are your statistics up to date? Is there excessive fragmentation?

Try moving "af.tracked_change_id = 1" into the join clause.

INNER JOIN AnalyzedFileView af 
ON p4f.FileRelease = (af.path+'#'+cast(af.revision as varchar))
AND af.tracked_change_id = 1

WHERE is applied after the INNER JOIN

Philip Kelley spotted the problem. It was a datatype mismatch between varchar in P4FileReleases and nvarchar in AnalyzedFileView.

继续阅读：query-optimization sql-server sql-server-2008 tsql

Getting rid of full index scan

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？