Why is my SQL 'NOT IN' clause producing different results from 'NOT EXISTS'

2023-01-02 01:59 问答作者：

I have two SQL queries producing different results when I would expect them to produce the same result. I am trying to find the number of events that do not have a corresponding location. All locations have an event but events can also link to non-location records.

The following query produces a count of 16244, the correct value.

SELECT COUNT(DISTINCT e.event_id)   
FROM   events AS e   
WHERE  NOT EXISTS   
  (SELECT * FROM locations AS l WHERE l.event_id = e.event_id)

The following query produces a count of 0.

SELECT COUNT(DISTINCT e.event_id) 
FROM   events AS e
WHERE  e.event_id NOT IN (SELECT  l.event_id FROM locations AS l)

The following SQL does so开发者_如何学JAVAme summaries of the data set

SELECT  'Event Count', 
        COUNT(DISTINCT event_id) 
        FROM events

UNION ALL

SELECT  'Locations Count', 
        COUNT(DISTINCT event_id) 
        FROM locations

UNION ALL

SELECT  'Event+Location Count', 
        COUNT(DISTINCT l.event_id) 
        FROM locations AS l  JOIN events AS e ON l.event_Id = e.event_id

And returns the following results

Event Count         139599
Locations Count         123355
Event+Location Count    123355

Can anyone shed any light on why the 2 initial queries do not produce the same figure.

You have a NULL in the subquery SELECT l.event_id FROM locations AS l so NOT IN will always evaluate to unknown and return 0 results

SELECT COUNT(DISTINCT e.event_id) 
FROM   events AS e
WHERE  e.event_id NOT IN (SELECT  l.event_id FROM locations AS l)

The reason for this behaviour can be seen from the below example.

'x' NOT IN (NULL,'a','b')

≡ 'x' <> NULL and 'x' <> 'a' and 'x' <> 'b'

≡ Unknown and True and True

≡ Unknown

The NOT IN form works differently for NULLs. The presence of a single NULL will cause the entire statement to fail, thus returning no results.

So you have at least one event_id in locations that is NULL.

Also, your query might be better written as a join:

SELECT 
    COUNT(DISTINCT e.event_id)    
FROM
    events AS e  
    LEFT JOIN locations AS l ON e.event_id = l.event_id
WHERE
    l.event_id IS NULL

[UPDATE: apparently, the NOT EXISTS version is faster.]

In and Exists are processed very very differently.

Select * from T1 where x in ( select y from T2 )

is typically processed as:

select * 
  from t1, ( select distinct y from t2 ) t2
 where t1.x = t2.y;

The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then joined to the original table -- typically.

As opposed to

select * from t1 where exists ( select null from t2 where y = x )

That is processed more like:

for x in ( select * from t1 )
   loop
      if ( exists ( select null from t2 where y = x.x )
      then 
         OUTPUT THE RECORD
      end if
   end loop

It always results in a full scan of T1 whereas the first query can make use of an index on T1(x).

继续阅读：sql sql-server-2005

Why is my SQL 'NOT IN' clause producing different results from 'NOT EXISTS'

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？