MySQL "in" statement
This question is really just out of curiosity. Can someone shed some light onto why this query:
insert into c (
select a.* from a,b
where a.x=8 and y=1 and b.z=a.z and active > '2010-01-07 00:00:00'
group by a.z
)
is a million times faster (or close enough) than the query below whenever there's quite a few records involved?
insert into c (
select * from a
where x=8 and y=1 and z in (
select z from b 开发者_高级运维where x=8 and active > '2010-01-07 00:00:00' group by z
)
)
What I mean is that a query joining a table is so much faster than using IN, but why is this?
Becuase the subquery is being performed once for every row in the enclosing query, and the join is only performed once.
Databases are optimised for set based queries, so most of the time joins will be faster than subqueries.
You are certainly not the only one to experience this (here is one example). Looks like the query optimizer for MySql doesn't optimise subqueries of this type.
In response to Tim's comment, no joins are very different.
Let us say that table a has 10,000 rows, and table b has 1,000 rows. The subquery approach runs the inner query for every row of the outer query. That results in 10,000 x 1,000 = 10,000,000 rows to process.
Depending on the fields involved in the join (and whether they are indexed), the join may only have to pass over each table once, resulting in ~11,000 rows processed. It may even only have to read each row of the smaller table and discard the remainder of the larger table resulting in even fewer reads.
It all depends on the join algorithm and the database engine.
精彩评论