开发者

Sub-query Optimization Talk with an example case

I need advises and want to share my experience about Query Optimization. This week, I found myself stuck in an interesting dilemma. I'm a novice person in mySql (2 years theory, less than one practical)

Environment :

I have a table that contains articles with a column 'type', and another table article_version that contain a date where an article is added in the DB, and a third table that contains all the article types along with types label and stuffs...

The 2 first tables are huge (800000+ fields and growing daily), the 3rd one is naturally small sized. The article tables have a lot of column, but we will only need 'ID' and 'type' in articles and 'dateAdded' in article_version to simplify things...

What I want to do :

A Query that, for a specified 'dateAdded', returns the number o开发者_运维百科f articles for each types (there is ~ 50 types to scan). What was already in place is 50 separate count, one for each document types oO ( not efficient, long(~ 5sec in general), ).

I wanted to do it all in one query and I came up with that :

SELECT type,
  (SELECT COUNT(DISTINCT articles.ID)
    FROM articles
      INNER JOIN article_version
        ON article_version.ARTI_ID = legi_arti.ID 
    WHERE type = td.NEW_ID
      AND dateAdded = '2009-01-01 00:00:00')  AS nbrArti 
FROM type_document td 
WHERE td.NEW_ID != '' 
GROUP BY td.NEW_ID;

The external select (type_document) allow me to get the 55 types of documents I need. The sub-Query is counting the articles for each type_document for the given date '2009-01-01'.

A common result is like :


*   type   *  nbrArti   *
*************************
* 123456   * 23         *
* 789456   * 5          *
* 16578    * 98         *
* ....     * ....       *
* ....     * ....       *
*************************

This query get the job done, but the join in the sub-query is making this extremely slow, The reason, if I'm right, is that a join is made by the server for each types, so 50+ times, this solution is even more slower than doing the 50 queries independently for each types, awesome :/

A Solution

I came up with a solution myself that drastically improve the performance with the same result, I just created a view corresponding to the subQuery, making the join on ids for each types... And Boom, it's f.a.s.t.

I think, correct me if I'm wrong, that the reason is the server only runs the JOIN statement once.

This solution is ~5 time faster than the solution that was already there, and ~20 times faster than my first attempt. Sweet

Questions / thoughts

  • With yet another view, I'll now need to check if I don't loose more than win when documents get inserted...
  • Is there a way to improve the original Query, by getting the JOIN statement out of the sub-query? (And getting rid of the view)
  • Any other tips/thoughts? (In Server Optimizing for example...)

Apologies for my approximating English, it'is not my primary language.


You cannot create a single index on (type, date_added), because these fields are in different tables.

Without the view, the subquery most probably selects article as a leading table and the index on type which is not very selective.

By creating the view, you force the subquery to calculate the sums for all types first (using a selective the index on date) and then use a JOIN BUFFER (which is fast enough for only 55 types).

You can achieve similar results by rewriting your query as this:

SELECT  new_id, COALESCE(cnt, 0) AS cnt
FROM    type_document td
LEFT JOIN
        (
        SELECT  type, COUNT(DISTINCT article_id) AS cnt
        FROM    article_versions av
        JOIN    articles a
        ON      a.id = av.article_id
        WHERE   av.date = '2009-01-01 00:00:00'
        GROUP BY
                type
        ) q
ON      q.type = td.new_id

Unfortunately, MySQL is not able to do table spools or hash joins, so to improve the performance you'll need to denormalize your tables: add type to article_version and create a composite index on (date, type).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜