Performance Penalties for Unused Joins
I'm writing a script that generates a report based on a query that uses several tables joined together. One of the inputs to the script is going to be a list of the fields that are required on the report. Depending on the fiel开发者_Go百科ds requested, some of the tables might not be needed. My question is: is there a [significant] performance penalty for including a join when if it is not referenced in a SELECT or WHERE clause?
Consider the following tables:
mysql> SELECT * FROM `Books`;
+----------------------+----------+
| title | authorId |
+----------------------+----------+
| Animal Farm | 3 |
| Brave New World | 2 |
| Fahrenheit 451 | 1 |
| Nineteen Eighty-Four | 3 |
+----------------------+----------+
mysql> SELECT * FROM `Authors`;
+----+----------+-----------+
| id | lastName | firstName |
+----+----------+-----------+
| 1 | Bradbury | Ray |
| 2 | Huxley | Aldous |
| 3 | Orwell | George |
+----+----------+-----------+
Does
SELECT
`Authors`.`lastName`
FROM
`Authors`
WHERE
`Authors`.`id` = 1
Outperform:
SELECT
`Authors`.`lastName`
FROM
`Authors`
JOIN
`Books`
ON `Authors`.`id` = `Books`.`authorId`
WHERE
`Authors`.`id` = 1
?
It seems to me that MySQL should just know to ignore the JOIN completely, since the table is not referenced in the SELECT or WHERE clause. But somehow I doubt this is the case. Of course, this is a really basic example. The actual data involved will be much more complex.
And really, it's not a terribly huge deal... I just need to know if my script needs to be "smart" about the joins, and only include them if the fields requested will rely on them.
This isn't actually unused since it means that only Authors that exist in Books are included in the result set.
JOIN
`Books`
ON `Authors`.`id` = `Books`.`authorId`
However if you "knew" that every Author existed in Book than there would be some performance benefit in removing the join but it would largely depend on idexes and the number of records in the table and the logic in the join (especially when doing data conversions)
This is the kind of question that is impossible to answer. Yes, adding the join will take additional time; it's impossible to tell whether you'd be able to measure that time without, well, uh....measuring the time.
Broadly speaking, if - like in your example - you're joining on primary keys, with unique indices, it's unlikely to make a measurable difference.
If you've got more complex joins (which you hint at), or are joining on fields without an index, or if your join involves a function, the performance penalty may be significant.
Of course, it may still be easier to do it this way that write multiple queries which are essentially the same, other than removing unneeded joins.
Final bit of advice - try abstracting the queries into views. That way, you can optimize performance once, and perhaps write your report queries in a more simple way...
Joins will always take time.
Side effects
On top of that inner join
(which is the default join) influences the result by limiting the number of rows you get.
So depending on whether all authors
are in books
the two queries may or may not be identical.
Also if an author
has written more than one book
the resultset of the 'joined' query will show duplicate results.
Performance
In the WHERE
clause you have stated authors.id
to be a constant =1
, therefore (provided you have indexes on author.id
and books.author_id
) it will be a very fast lookup for both tables. The query-time between the two tables will be very close.
In general joins can take quite a lot of time though and with all the added side effects should only be undertaken if you really want to use the extra info the join offers.
It seems that there are two things that you are trying to determine: If there are any optimizations that can be done between the two select statements, and which of the two would be the fastest to execute.
It seems that since the join really is limiting the returned results by authors who have books in the list, that there can not be that much optimization done.
It also seems that for the case that you were describing where the joined table really has no limiting effect on the returned results, that the query where there was no joining of the tables would perform faster.
精彩评论