Sub-query Optimization Talk with an example case
I need advises and want to share my experience about Query Optimization. This week, I found myself stuck in an interesting dilemma. I'm a novice person in mySql (2 years theory, less than one practical)
Environment :
I have a table that contains articles with a column 'type', and another table article_version that contain a date where an article is added in the DB, and a third table that contains all the article types along with types label and stuffs...
The 2 first tables are huge (800000+ fields and growing daily), the 3rd one is naturally small sized. The article tables have a lot of column, but we will only need 'ID' and 'type' in articles and 'dateAdded' in article_version to simplify things...
What I want to do :
A Query that, for a specified 'dateAdded', returns the number o开发者_运维百科f articles for each types (there is ~ 50 types to scan). What was already in place is 50 separate count, one for each document types oO ( not efficient, long(~ 5sec in general), ).
I wanted to do it all in one query and I came up with that :
SELECT type,
(SELECT COUNT(DISTINCT articles.ID)
FROM articles
INNER JOIN article_version
ON article_version.ARTI_ID = legi_arti.ID
WHERE type = td.NEW_ID
AND dateAdded = '2009-01-01 00:00:00') AS nbrArti
FROM type_document td
WHERE td.NEW_ID != ''
GROUP BY td.NEW_ID;
The external select (type_document) allow me to get the 55 types of documents I need. The sub-Query is counting the articles for each type_document for the given date '2009-01-01'.
A common result is like :
* type * nbrArti * ************************* * 123456 * 23 * * 789456 * 5 * * 16578 * 98 * * .... * .... * * .... * .... * *************************
This query get the job done, but the join in the sub-query is making this extremely slow, The reason, if I'm right, is that a join is made by the server for each types, so 50+ times, this solution is even more slower than doing the 50 queries independently for each types, awesome :/
A Solution
I came up with a solution myself that drastically improve the performance with the same result, I just created a view corresponding to the subQuery, making the join on ids for each types... And Boom, it's f.a.s.t.
I think, correct me if I'm wrong, that the reason is the server only runs the JOIN statement once.
This solution is ~5 time faster than the solution that was already there, and ~20 times faster than my first attempt. Sweet
Questions / thoughts
- With yet another view, I'll now need to check if I don't loose more than win when documents get inserted...
- Is there a way to improve the original Query, by getting the JOIN statement out of the sub-query? (And getting rid of the view)
- Any other tips/thoughts? (In Server Optimizing for example...)
Apologies for my approximating English, it'is not my primary language.
You cannot create a single index on (type, date_added)
, because these fields are in different tables.
Without the view, the subquery most probably selects article
as a leading table and the index on type
which is not very selective.
By creating the view, you force the subquery to calculate the sums for all types first (using a selective the index on date
) and then use a JOIN BUFFER
(which is fast enough for only 55
types).
You can achieve similar results by rewriting your query as this:
SELECT new_id, COALESCE(cnt, 0) AS cnt
FROM type_document td
LEFT JOIN
(
SELECT type, COUNT(DISTINCT article_id) AS cnt
FROM article_versions av
JOIN articles a
ON a.id = av.article_id
WHERE av.date = '2009-01-01 00:00:00'
GROUP BY
type
) q
ON q.type = td.new_id
Unfortunately, MySQL
is not able to do table spools or hash joins, so to improve the performance you'll need to denormalize your tables: add type
to article_version
and create a composite index on (date, type)
.
精彩评论