开发者

Does using WHERE IN hurt query performance?

I've heard that using an IN Clause can hurt performance because it doesn't use Indexes properly. See example below:

SELECT ID, Name, Address 
FROM people
WHERE id IN (SELECT ParsedValue FROM UDF_Pa开发者_开发问答rseListToTable(@IDList))

Is it better to use the form below to get these results?

SELECT ID,Name,Address
FROM People as p
INNER JOIN UDF_ParseListToTable(@IDList) as ids
ON p.ID = ids.ParsedValue

Does this depend on which version of SQL Server you are using? If so which ones are affected?


Yes, assuming relatively large data sets.

It's considered better to use EXISTS for large data sets. I follow this and have noticed improvements in my code execution time.

According to the article, it has to do with how the IN vs. EXISTS is internalized. Another article: http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx


It's very simple to find out - open Management studio, put both versions of the query in, then run with the Show Execution plan turned on. Compare the two execution plans. Often, but not always, the query optimizer will make the same exact plan / literally do the same thing for different versions of a query that are logically equivalent.

In fact, that's its purpose - the goal is that the optimizer would take ANY version of a query, assuming the logic is the same, and make an optimal plan. Alas, the process isn't perfect.

Here's one scientific comparison:

http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/ http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/


IN can hurt performance because SQL Server must generate a complete result set and then create potentially a huge IF statement, depending on the number of rows in the result set. BTW, calling a UDF can be a real performance hit as well. They are very nice to use but can really impact performance, if you are not careful. You can Google UDF and Performance to do some research on this.


More than the IN or the Table Variable, I would think that proper use of an Index would increase the performance of your query.

Also, from the table name, it does not seem like you are going to have a lot of entries in it so which way you go may be moot point in this particular example.

Secondly, IN will be evaluated only once since there is no subquery. In your case, the @IDList variable is probably going to cause mistmatches you will need @IDList1, @IDList2, @IdList3.... because IN demands a list.

As a general rule of thumb, you should avoid IN with subqueries and use EXISTS with a join - you will get better performance more often than not.


Your first example is not the same as your second example, because WHERE X IN (@variable) is the same as WHERE X = @variable (i.e. you cannot have variable lists).

Regarding performance, you'll have to look at the execution plans to see what indexes are chosen.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜