开发者

Super Slow MySQL - Need Help!

I have a SUPER slow query, which I posted here: http://pastebin.com/E5sdRi7e. When I did an EXPLAIN, I got the following:

id  select_type         table       type    possible_keys  key           key_len  ref                                 rows  Extra
1   PRIMARY             <derived2>  ALL     NULL           NULL          NULL     NULL                                5     Using filesort
2   DERIVED             Workflow    ALL     PRIMARY        NULL          NULL     NULL                                9     Using temporary; Using filesort
2   DERIVED             <derived3>  ALL     NULL           NULL          NULL     NULL                                141   Using where; Using join buffer
2   DERIVED             DataSource  ALL     PRIMARY        NULL          NULL     NULL                                1310  Using where; Using join buffer
2   DERIVED             <derived4>  ALL     NULL           NULL          NULL     NULL                                1310  Using where; Using join buffer
2   DERIVED             User        eq_ref  PRIMARY        PRIMARY       4        LatestDataSourceActivityLog.UserId  1
4   DERIVED             t1          ALL     NULL           NULL          NULL     NULL                                5400  Using where; Using temporary; Using filesort
5   DEPENDENT SUBQUERY  t2          ref     DataSourceId   DataSourceId  4        companyname_db.t1.DataSourceId      4
3   DERIVED             DataSource  range   PRIMARY        PRIMARY       4        NULL                                142   Using where

What does the above table tell me? Does it help me identify which fields should be indexed?

Any help is greatly appreciated.

Query

SELECT WrappedData.*
FROM   (SELECT ParentLeafNodeDataSource.Id,
               LatestDataSourceActivityLog.UserId,
               DataSource.Status AS StatusCode,
               ( CASE
                   WHEN User.Name IS NULL THEN 'CompanyName'
                   ELSE User.Name
         开发者_高级运维        END )           AS `Username`,
               Workflow.Name     AS WorkflowName,
               LatestDataSourceActivityLog.Timestamp
        FROM   DataSource,
               Workflow,
               (SELECT *
                FROM   DataSource
                WHERE  DataSource.Id IN ( 0, 1, 2, 3,
                                          4, 5, 6, 7,
                                          8, 9, 10, 11,
                                          12, 13, 16, 21,
                                          22, 23, 24, 25,
                                          26, 27, 28, 29,
                                          30, 31, 32, 33,
                                          34, 35, 36, 37,
                                          38, 39, 40, 41,
                                          42, 43, 44, 45,
                                          46, 47, 48, 49,
                                          50, 51, 52, 53,
                                          54, 55, 56, 57,
                                          58, 59, 60, 61,
                                          62, 63, 64, 65,
                                          66, 67, 68, 69,
                                          70, 71, 72, 73,
                                          74, 75, 76, 77,
                                          78, 79, 80, 81,
                                          83, 84, 85, 86,
                                          87, 88, 89, 90,
                                          91, 92, 93, 94,
                                          95, 96, 97, 98,
                                          99, 100, 101, 102,
                                          103, 104, 105, 106,
                                          107, 108, 109, 110,
                                          111, 112, 113, 114,
                                          115, 116, 117, 118,
                                          119, 120, 142, 1293,
                                          1294, 1295, 1296, 1297,
                                          1298, 1299, 143, 1300,
                                          1301, 1302, 1303, 1304,
                                          1305, 1306, 144, 146,
                                          145, 1307, 1308, 1309,
                                          1310, 147, 149, 148,
                                          150, 151 )) AS ParentLeafNodeDataSource,
               (SELECT t1.*
                FROM   DataSourceActivityLog AS t1
                WHERE  Timestamp = (SELECT Max(t2.Timestamp)
                                    FROM   DataSourceActivityLog AS t2
                                    WHERE  t1.DataSourceId = t2.DataSourceId)
                GROUP  BY t1.DataSourceId) AS LatestDataSourceActivityLog
               LEFT JOIN User
                 ON User.Id = LatestDataSourceActivityLog.UserId
        WHERE  ParentLeafNodeDataSource.Status = '203'
                OR ParentLeafNodeDataSource.Status = '204'
                   AND Workflow.Id = ParentLeafNodeDataSource.WorkflowId
                   AND LatestDataSourceActivityLog.DataSourceId = ParentLeafNodeDataSource.Id
                   AND DataSource.Id = LatestDataSourceActivityLog.DataSourceId
                   AND LatestDataSourceActivityLog.UserId = 1
        GROUP  BY ParentLeafNodeDataSource.Id) AS WrappedData
ORDER  BY WrappedData.`Timestamp` DESC


It's very difficult to say conclusively, but here are a couple refactoring things.

On performance, the first thing to look at are GROUP functions.

           (SELECT t1.*
            FROM   DataSourceActivityLog AS t1
            WHERE  Timestamp = (SELECT Max(t2.Timestamp)
                                FROM   DataSourceActivityLog AS t2
                                WHERE  t1.DataSourceId = t2.DataSourceId)
            GROUP  BY t1.DataSourceId) AS LatestDataSourceActivityLog

Which can eliminate the use of MAX entirely

           (SELECT t1.*
            FROM   DataSourceActivityLog AS t1
            WHERE  Timestamp = (SELECT t2.Timestamp
                                FROM   DataSourceActivityLog AS t2
                                WHERE  t1.DataSourceId = t2.DataSourceId
                                ORDER BY t2.Timestamp DESC
                                LIMIT 1)
            GROUP  BY t1.DataSourceId) AS LatestDataSourceActivityLog

Probably not a big performance issue, but here you can use IFNULL or COALESCE instead of a CASE.

( CASE
    WHEN User.Name IS NULL THEN 'CompanyName'
    ELSE User.Name
END )

Instead

( IFNULL(User.Name,'CompanyName' )

In terms of indexes, they increase SELECT performance by making lookups easier, but they slow down write operations as the indexes have to be updated as well. If your application isn't write-heavy, you should be indexing commonly searched columns, particularly in large tables.

In this query, it looks like you'd benefit by adding an index to DataSourceId, but I can't test if there's any gain. The primary keys will already be indexed.


I would try the following:

  • the outer wrapper is completely useless, putting the ORDER BY in the inner query should work the same
  • try to rewrite the subqueries to be used as JOIN's
  • then move the WHERE-clauses to the relevant JOINS's so the intermediate resultset becomes smaller
  • look at the WHERE and JOIN's which indexes should be made.

A quick try (I'm not sure the result will be the same)

SELECT
    dsa.Status AS StatusCode,
    dsb.Id,
    dsl.UserId,
    dsl.Timestamp
    wf.Name AS WorkflowName,
    COALESCE(u.Name, 'CompanyName') AS `Username`
FROM 
    DataSource dsa
    INNER JOIN DataSource dsb
        ON  dsb.Id IN ( 0, 1, 2, 3, 4, 5, 6, 7, etc ))
        AND dsb.Status = '203' OR dsb.Status = '204'
    INNER JOIN DataSourceActivityLog dsl
        ON  dsl.DataSourceId=dsa.Id
        AND dsl.DataSourceId=dsb.Id
        AND dsl.UserId = 1
        AND dsl.Timestamp=(
            SELECT MAX(t2.Timestamp)
            FROM   DataSourceActivityLog AS dslt
            WHERE  dslt.DataSourceId = dsl.DataSourceId
        )
    INNER JOIN Workflow wf
                   ON  wf.Id = dsb.WorkflowId
    LEFT JOIN User u
        ON u.Id = dsl.UserId
GROUP  BY
    dsl.Id
ORDER  BY
    dsl.Timestamp DESC

Perhaps using the refactor of Zurahn to get rid of the GROUP BY in the subquery

With indexes on:

  • DataSource.WorkFlowId, DataSource.Status
  • DataSourceActivityLog.Timestamp, DataSourceActivityLog.UserId, DataSourceActivityLog.DataSourceId

Ok actually, I came to the conclusion that dsb (originally ParentLeafNodeDataSource) is actually the source of the data, and this could fill the WHERE clause. Personally I try to start with the source of the data, and then JOIN the rest against that. This will usually result in a query where it is easy to understand what is actually selected. Instead of the last JOIN suddenly cutting down on the resultset. So reordering the JOIN's could do that, and it would be something like:

SELECT
    dsa.Status AS StatusCode,
    dsb.Id,
    dsl.UserId,
    dsl.Timestamp
    wf.Name AS WorkflowName,
    COALESCE(u.Name, 'CompanyName') AS `Username`
FROM 
    DataSource dsb
    INNER JOIN Workflow wf
        ON  dsb.WorkflowId=wf.Id
    INNER JOIN DataSourceActivityLog dsl
        ON  dsl.DataSourceId=dsb.Id
        AND dsl.UserId=1
        AND dsl.Timestamp=(
            SELECT MAX(t2.Timestamp)
            FROM   DataSourceActivityLog AS dslt
            WHERE  dslt.DataSourceId = dsl.DataSourceId
        )
    INNER JOIN DataSource dsa
        ON  dsl.DataSourceId=dsa.Id
    LEFT JOIN User u
        ON dsl.UserId=u.Id
WHERE
    dsb.Id IN ( 0, 1, 2, 3, 4, 5, 6, 7, etc ))
    AND dsb.Status = '203' OR dsb.Status = '204'
GROUP  BY
    dsl.Id
ORDER  BY
    dsl.Timestamp DESC


Have you considered the MySql Query Profiler ?

This is how you will understand your performance problems.

Without that step, most people here will sadly prefer writing jokes on your query than trying to help you.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜