开发者

SQL select with "IN" subquery returns no records if the sub-query contains NULL

I came across this interesting behavior. I see left-join is the way to go, but would still like to have this cleared. Is it a bug or behavior by-design? Any explanations?

When I select records from left table, where a value is not present in the result of a subquery on the right table, the expected "missing" record is not returned if the subquery result has nulls. I expected the two ways to write this query to be equivalent.

Thanks!

declare @left table  (id int not null primary key identity(1,1), ref int null)
declare @right table (id int not null primary key identity(1,1), ref int null)

insert @left (ref) values (1)
insert @left (ref) values (2)

insert @right (ref) values (1)
insert @right (ref) values (null)

print 'unexpected empty resultset:'
select * from @left
where ref not in (select ref from @right)

print 'expected result - ref 2:'
select * from @left
where ref not in (select ref fro开发者_运维知识库m @right where ref is not null)

print 'expected result - ref 2:'
select l.* from @left l
  left join @right r on r.ref = l.ref
where r.id is null

print @@version

gives:

(1 row(s) affected)

(1 row(s) affected)

(1 row(s) affected)

(1 row(s) affected)
unexpected empty resultset:
id          ref
----------- -----------

(0 row(s) affected)

expected result - ref 2:
id          ref
----------- -----------
2           2

(1 row(s) affected)

expected result - ref 2:
id          ref
----------- -----------
2           2

(1 row(s) affected)

Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (X64) 
    Apr  2 2010 15:48:46 
    Copyright (c) Microsoft Corporation
    Standard Edition (64-bit) on Windows NT 6.0 <X64> (Build 6002: Service Pack 2) (Hypervisor)


This is by design. If the match fails and the set contains NULL the result is NULL, as specified by the SQL standard.

'1' IN ('1', '3') => true
'2' IN ('1', '3') => false
'1' IN ('1', NULL) => true
'2' IN ('1', NULL) => NULL

'1' NOT IN ('1', '3') => false
'2' NOT IN ('1', '3') => true
'1' NOT IN ('1', NULL) => false
'2' NOT IN ('1', NULL) => NULL

Informally, the logic behind this is that NULL can be thought of as an unknown value. For example here it doesn't matter what the unknown value is - '1' is clearly in the set, so the result is true.

'1' IN ('1', NULL) => true

In the following example we can't be sure that '2' is in the set, but since we don't know all the values we also can't be sure that it isn't in the set. So the result is NULL.

'2' IN ('1', NULL) => NULL

Another way of looking at it is by rewriting x NOT IN (Y, Z) as X <> Y AND X <> Z. Then you can use the rules of three-valued logic:

true AND NULL => NULL
false AND NULL => false


Yes, this is how it was designed. There are also many other considerations between doing a LEFT JOIN or a NOT IN. You should see this link to have a very good explanation of this behavior.


That is the way the ANSI committee thinks have to be done.

You can precede your queries with

set ansi_defaults OFF

and you get the result that you expect.

Since SQL-Server 7.0 Microsoft is rather strict about following ansi standards.

EDIT:

Don't fight against the defaults. You will give up in the end.


The root cause of the behavior is explained by Mark. It can be resolved in more than one way, - LEFT JOIN, Filtering NULL values from inner query by filtering out them from where clause OR from select clause, using a co-related sub-query - to name a few.

Following three short posts is a case study on the same subject:- NOT IN Subquery return zero rows -Issue, NOT IN Subquery return zero rows -Root Cause, NOT IN Subquery return zero rows -Workarounds

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜