SQL Server query performance - removing need for Hash Match (Inner Join)
I have the following query, which is doing very little and is an example of the kind of joins I am doing throughout the system.
select t1.PrimaryKeyId, t1.AdditionalColumnId
from TableOne t1
join TableTwo t2 on t1.ForeignKeyId = t2.PrimaryKeyId
join TableThree t3 on t1.PrimaryKeyId = t3.ForeignKeyId
join TableFour t4 on t3.ForeignKeyId = t4.PrimaryKeyId
join TableFive t5 on t4.ForeignKeyId = t5.PrimaryKeyId
where
t1.StatusId = 1
and t开发者_如何学C5.TypeId = 68
There are indexes on all the join columns, however the performance is not great. Inspecting the query plan reveals a lot of Hash Match (Inner Joins) when really I want to see Nested Loop joins.
The number of records in each table is as follows:
select count(*) from TableOne
= 64393
select count(*) from TableTwo
= 87245
select count(*) from TableThree
= 97141
select count(*) from TableFour
= 116480
select count(*) from TableFive
= 62
What is the best way in which to improve the performance of this type of query?
First thoughts:
- Change to EXISTS (changes equi-join to semi-join)
- You need to have indexes on t1.StatusId, t5.TypeId and INCLUDE t1.AdditionalColumnID
I wouldn't worry about your join method yet...
Personally, I've never used a JOIN hint. They only work for the data, indexes and statistics you have at that point in time. As these change, your JOIN hint limits the optimiser
select t1.PrimaryKeyId, t1.AdditionalColumnId
from
TableOne t1
where
t1.Status = 1
AND EXISTS (SELECT *
FROM
TableThree t3
join TableFour t4 on t3.ForeignKeyId = t4.PrimaryKeyId
join TableFive t5 on t4.ForeignKeyId = t5.PrimaryKeyId
WHERE
t1.PrimaryKeyId = t3.ForeignKeyId
AND
t5.TypeId = 68)
AND EXISTS (SELECT *
FROM
TableTwo t2
WHERE
t1.ForeignKeyId = t2.PrimaryKeyId)
Index for tableOne.. one of
(Status, ForeignKeyId) INCLUDE (AdditionalColumnId)
(ForeignKeyId, Status) INCLUDE (AdditionalColumnId)
Index for tableFive... probably (typeID, PrimaryKeyId)
Edit: updated JOINS and EXISTS to match question fixes
SQL Server is pretty good at optimizing queries, but it's also conservative: it optimizes queries for the worst case. A loop join typically results in an index lookup and a bookmark lookup for for every row. Because loop joins cause dramatic degradation for large sets, SQL Server is hesitant to use them unless it's sure about the number of rows.
You can use the forceseek
query hint to force an index lookup:
inner join TableTwo t2 with (FORCESEEK) on t1.ForeignKeyId = t2.PrimaryKeyId
Alternatively, you can force a loop join with the loop
keyword:
inner LOOP join TableTwo t2 on t1.ForeignKeyId = t2.PrimaryKeyId
Query hints limit SQL Server's freedom, so it can no longer adapt to changed circumstances. It's best practice to avoid query hints unless there is a business need that cannot be met without them.
精彩评论