开发者

Why are self joins faster than or?

I'm trying to filter a relationship table down to get a subset of the table where two conditions are met (ie: I want all of the id's of the entries who's color_ids are 1 or 2). It's a beefy table, so I'm trying to optimize as much as possible.

I was wondering if anyone could explain my finding in this case:

Why is

SELECT DISTINCT a.id 
  FROM RelationshipTable as a 
开发者_高级运维  JOIN RelationshipTable as b ON b.id = a.id 
 WHERE a.color_id = 1 
   AND b.color_id = 2;

faster than

SELECT DISTINCT id 
  FROM RelationshipTable 
 WHERE color_id = 1 
    OR color_id = 2;

in MySql 4.1?


The two are not the same query and should not be giving the same result set. In the first query you want all the records which meet both conditions, you have a record with a color_id = of 1 and a record with a color_id of 2 for the same ID. In the second query you will get all records that have both color ids and all records that have only one or the other. Of course since you are asking for a differnt field to be returned you might not see this. And the second query is somewhat silly anyway as it can be expressed as:

select 1 as color id 
union all
select 2

And never hit a table at all. That would make it super fast.


The first query is impossible and will never return a result set. It's basically saying "Give me all the records in the table where color_id is 1 AND color_id is 2" which can never happen.

If you want to ask the difference between

SELECT DISTINCT a.id 
  FROM RelationshipTable as a 
  JOIN RelationshipTable as b ON b.id = a.id 
 WHERE a.color_id = 1 
   OR b.color_id = 2;

versus

SELECT DISTINCT color_id 
  FROM RelationshipTable 
 WHERE color_id = 1 
    OR color_id = 2;

In this case the first will always be slower than the second for large tables. The first results in a full table scan for table A while the second one uses the indexes that should be used in the where clause.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜