Why is INTERSECT as slow as a nested JOIN?
I'm using MS SQL.
I have a huge table with indices to make this query fast:
select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 47828 and
IncrementalStatistics.Created > '12/2/2010
It returns in less than 1 second. The table has billions of rows. There are only around 10000 results.
I would expect this query to also complete in about a second:
select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 47828 and
IncrementalStatistics.Created > '12/2/2010'
intersect
select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 40652 and
IncrementalStatistics.Created > '12/2/2010'
intersect
select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 14403 and
IncrementalStatistics.Created > '12/2/20开发者_运维百科10'
But it takes 20 seconds. All the individual queries take < 1 second and return around 10k results.
I would expect SQL internally to throw the results from each of these subqueries into a hashtable and do a hash-intersection - should be O(n). The result sets are big enough to fit in memory, so I doubt it's an IO issue.
I wrote an alternate query that is just a series of nested JOINs and this also takes around 20 seconds, which makes sense.
Why is INTERSECT so slow? Does it reduce to a JOIN at an early stage of the query processing?
Give this a try instead. Untested obviously, but I think it will get you the results you want.
select userid
from IncrementalStatistics
where IncrementalStatisticsTypeID = 5
and IncrementalStatistics.AssociatedPlaceID in (47828,40652,14403)
and IncrementalStatistics.Created > '12/2/2010'
group by userid
having count(distinct IncrementalStatistics.AssociatedPlaceID) = 3
精彩评论