Which of those 2 MySQL queries is more efficient?
SELECT *
FROM openorders_tracking
WHERE id NOT IN (SELECT tracking_id FROM openorders_commission)
SELECT *
FROM openorders_tracking
LEFT JOIN openorders_commission
ON openorders_tracking.id=openorders_commission.tracking_id
WHERE openorders_commission.id IS NULL
I'm wondering both specifically for this query, and in general if there is a significant difference.
I don't have any data yet, nor indexes yet. I guess it's more in general if "NOT IN" is better than a JOIN or vice-versa.
For those who like EXPLAIN, here is the results for now (again, no indexes yet besides the primaries):
mysql> explain SELECT * FROM openorders_tracking WHERE id NOT IN (SELECT trackin
g_id FROM openorders_commission);
+----+--------------------+-----------------------+--------+---------------+----
--+---------+------+------+---------------------+
| id | select_type | table | type | possible_keys | key
| key_len | ref | rows | Extra |
+----+--------------------+-----------------------+--------+---------------+----
--+---------+------+------+---------开发者_高级运维------------+
| 1 | PRIMARY | openorders_tracking | ALL | NULL | NUL
L | NULL | NULL | 341 | Using where |
| 2 | DEPENDENT SUBQUERY | openorders_commission | system | NULL | NUL
L | NULL | NULL | 0 | const row not found |
+----+--------------------+-----------------------+--------+---------------+----
--+---------+------+------+---------------------+
2 rows in set (0.00 sec)
mysql> explain SELECT * FROM openorders_tracking LEFT JOIN openorders_commission
ON openorders_tracking.id=openorders_commission.tracking_id WHERE openorders_co
mmission.id IS NULL;
+----+-------------+-----------------------+--------+---------------+------+----
-----+------+------+---------------------+
| id | select_type | table | type | possible_keys | key | key
_len | ref | rows | Extra |
+----+-------------+-----------------------+--------+---------------+------+----
-----+------+------+---------------------+
| 1 | SIMPLE | openorders_commission | system | PRIMARY | NULL | NUL
L | NULL | 0 | const row not found |
| 1 | SIMPLE | openorders_tracking | ALL | NULL | NULL | NUL
L | NULL | 341 | |
+----+-------------+-----------------------+--------+---------------+------+----
-----+------+------+---------------------+
2 rows in set (0.00 sec)
It's an odd thing to ask when you have the two queries and the database in front of you. Try running them, and use EXPLAIN to see the execution plans.
My guess is that MySQL will optimize them both into the same execution plan but that might depending column types and your indexing scheme.
Attacking this from a purely software development approach I would say that this is premature optimization, what you should be striving for is readability. As to which query is more readable would be the call of you and your team. Whilst this doesn't answer the question, which I think should be answered (not by me but someone more qualified like a DBA), you should always think about what you are going to gain by optimizing.
Taken from Wikipedia (Program optimization)
When to optimize
Optimization can reduce readability and add code that is used only to improve the performance. This may complicate programs or systems, making them harder to maintain and debug. As a result, optimization or performance tuning is often performed at the end of the development stage.
Donald Knuth made the following two statements on optimization:
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil"
(He also attributed the quote to Tony Hoare several years later, although this might have been an error as Hoare disclaims having coined the phrase.)
"In established engineering disciplines a 12 % improvement, easily obtained, is never considered marginal and I believe the same viewpoint should prevail in software engineering"
"Premature optimization" is a phrase used to describe a situation where a programmer lets performance considerations affect the design of a piece of code. This can result in a design that is not as clean as it could have been or code that is incorrect, because the code is complicated by the optimization and the programmer is distracted by optimizing.
An alternative approach is to design first, code from the design and then profile/benchmark the resulting code to see which parts should be optimized. A simple and elegant design is often easier to optimize at this stage, and profiling may reveal unexpected performance problems that would not have been addressed by premature optimization.
In practice, it is often necessary to keep performance goals in mind when first designing software, but the programmer balances the goals of design and optimization.
I was told to limit the number of SELECTs per query, so based on that I would say the JOIN is the most efficient.
I would use the top query. It's easier to understand, IMHO and its use of "select *" will only select the columns from the table you're after.
However, it's very possible they will have the same execution plan.
SELECT *
FROM openorders_tracking
LEFT JOIN openorders_commission
ON openorders_tracking.id=openorders_commission.tracking_id
WHERE openorders_commission.id IS NULL
The above said query will be more efficient. They may produce the same performance this time but once you do proper indexing, this query will help you always. So try to index your tables and test both the queries with worst case scenario (try to insert more and more records and you will see the difference)
I have always understood the NOT EXISTS variant of this type of query to be more efficient normally, as it only checks for the first occurence of the queried value in the lookup table - eg:
SELECT *
FROM openorders_tracking t
WHERE NOT EXISTS
(SELECT NULL FROM openorders_commission c
WHERE c.tracking_id = t.id)
As ever, check the actual performance of the query to determine which is faster.
精彩评论