Are unbound result sets more performant than a query with joins when used with IN clause?

2023-03-22 04:55 问答作者：

With a subquery within the IN clause, am I better off returning a large number of results (10,000+ rows) or using table joins to restrict the results?

For example:

select * from Units u
  inner join pm_Properties p on p.PropertyId = u.PropertyId
  where p.State = 'CA'
  and u.UnitId in (select UnitId from Leases l where l.IsActive = 1)

select * from Units u
  inner join pm_Properties p on p.PropertyId = u.PropertyId
  where p.State = 'CA'
  and u.UnitId in (select UnitId from Leases l 
    inner join Units u on u.UnitId = l.UnitId 
    inner join Properties p on p.PropertyId = u.PropertyId 
    where l.IsActive = 1 and p.State = 'CA')

In both cases, I do want to return unit results joined with properties. I'm just curious about the performance difference regar开发者_如何学运维ding the IN statement. Additionally, if it matters, the server is MS SQL 2008 R2.

Use the first one (unless you have performance problems for some reason).

The second one is just supplying redundant information that SQL server will be able to work out for itself - all you have done is increase the complexity of the query increasing the cost of working out the execution plan and increasing the chances that it won't be able to find the optimal plan.

I think the most efficient way to let the optimizer do its job is to avoid both versions with subqueries. Why not this?

SELECT * 
  FROM Units AS u
  INNER JOIN pm_Properties AS p
  ON p.PropertyId = u.PropertyId
  INNER JOIN Leases AS l
  ON l.UnitId = u.UnitId
  AND l.IsActive = 1
  WHERE p.State = 'CA';

Based on JNK's suggestion, yet another alternative is as follows (but as I suggested, there may be a whole lot of I/O checking for valid leases if there are a large percentage not in CA):

SELECT *
    FROM Units AS u
    INNER JOIN pm_Properties AS p
    ON p.PropertyId = u.PropertyId
    WHERE p.State = 'CA'
    AND EXISTS
    (
      SELECT 1 FROM Leases AS l
         WHERE l.UnitId = u.UnitId
         AND l.IsActive = 1
    );

Of course, you know your data, your indexes, your hardware, your usage patterns, etc. The most effective way to determine which route you're "better off" with is to test them yourself - this will give you a more reliable answer than anyone you can ask here. No offense to anyone, but there are too many variables to be able to make a blanket statement.

Also - and this is just a nit - what is the point of making a table alias Units -> unit? You typed 4 extra characters to avoid typing one - which might make sense if you planned to reference Units more than 4 times in the query, but especially in that case I would rather make a shorter alias.

As JNK implied, it depends.

It depends on how complex your join conditions must be, and how many rows each table has relative to the other tables. I think in a "normal" situation your first example without the joins inside the IN clause will be faster, but that will vary by the factors I mentioned.

So, if you are really trying to absolutely optimize performance, I would test them both, and put whichever comes out better for your particular situation into production. But, if they were even close I would strongly prefer the first one (which I think will win in most use cases anyway). It is easier for the optimizer to work with and more importantly it is easier to read which means it will be easier to maintain.

继续阅读：performance sql-server

Are unbound result sets more performant than a query with joins when used with IN clause?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？