Are unbound result sets more performant than a query with joins when used with IN clause?
With a subquery within the IN clause, am I better off returning a large number of results (10,000+ rows) or using table joins to restrict the results?
For example:
select * from Units u
inner join pm_Properties p on p.PropertyId = u.PropertyId
where p.State = 'CA'
and u.UnitId in (select UnitId from Leases l where l.IsActive = 1)
vs
select * from Units u
inner join pm_Properties p on p.PropertyId = u.PropertyId
where p.State = 'CA'
and u.UnitId in (select UnitId from Leases l
inner join Units u on u.UnitId = l.UnitId
inner join Properties p on p.PropertyId = u.PropertyId
where l.IsActive = 1 and p.State = 'CA')
In both cases, I do want to return unit results joined with properties. I'm just curious about the performance difference regar开发者_如何学运维ding the IN statement. Additionally, if it matters, the server is MS SQL 2008 R2.
Use the first one (unless you have performance problems for some reason).
The second one is just supplying redundant information that SQL server will be able to work out for itself - all you have done is increase the complexity of the query increasing the cost of working out the execution plan and increasing the chances that it won't be able to find the optimal plan.
I think the most efficient way to let the optimizer do its job is to avoid both versions with subqueries. Why not this?
SELECT *
FROM Units AS u
INNER JOIN pm_Properties AS p
ON p.PropertyId = u.PropertyId
INNER JOIN Leases AS l
ON l.UnitId = u.UnitId
AND l.IsActive = 1
WHERE p.State = 'CA';
Based on JNK's suggestion, yet another alternative is as follows (but as I suggested, there may be a whole lot of I/O checking for valid leases if there are a large percentage not in CA):
SELECT *
FROM Units AS u
INNER JOIN pm_Properties AS p
ON p.PropertyId = u.PropertyId
WHERE p.State = 'CA'
AND EXISTS
(
SELECT 1 FROM Leases AS l
WHERE l.UnitId = u.UnitId
AND l.IsActive = 1
);
Of course, you know your data, your indexes, your hardware, your usage patterns, etc. The most effective way to determine which route you're "better off" with is to test them yourself - this will give you a more reliable answer than anyone you can ask here. No offense to anyone, but there are too many variables to be able to make a blanket statement.
Also - and this is just a nit - what is the point of making a table alias Units -> unit? You typed 4 extra characters to avoid typing one - which might make sense if you planned to reference Units more than 4 times in the query, but especially in that case I would rather make a shorter alias.
As JNK implied, it depends.
It depends on how complex your join conditions must be, and how many rows each table has relative to the other tables. I think in a "normal" situation your first example without the joins inside the IN clause will be faster, but that will vary by the factors I mentioned.
So, if you are really trying to absolutely optimize performance, I would test them both, and put whichever comes out better for your particular situation into production. But, if they were even close I would strongly prefer the first one (which I think will win in most use cases anyway). It is easier for the optimizer to work with and more importantly it is easier to read which means it will be easier to maintain.
精彩评论