开发者

MySQL Query, Subquery optimization, SELECT, JOIN

I have one table with some data and I want select newest data for each type...

Table:

+----+------+------+---------------------+
| ID | data | type | date                |
+----+------+------+---------------------+
|  1 | just |    2 | 2010-08-07 14:开发者_开发问答24:48 |
|  2 | some |    2 | 2010-08-07 18:07:32 |
|  3 | data |    9 | 2010-08-06 02:52:17 |
|  4 | abcd |    1 | 2010-08-08 17:23:22 |
|  5 | efg1 |    5 | 2010-07-10 21:36:55 |
|  6 | c123 |    5 | 2010-07-10 20:44:36 |
|  7 | bbey |   12 | 2010-08-09 09:01:26 |
+----+------+------+---------------------+

Currently I'm using simple subquery and looks like everything works

SELECT `data`,`type`,`date`
FROM `table1`
WHERE `date` = (
                  SELECT MAX( `date` )
                  FROM `table1` AS tbl2
                  WHERE tbl2.`type` = `table1`.`type`
                )
GROUP BY `type`
ORDER BY `type`,`date`

Result:

+------+------+---------------------+
| data | type | date                |
+------+------+---------------------+
| abcd |    1 | 2010-08-08 17:23:22 |
| some |    2 | 2010-08-07 18:07:32 |
| efg1 |    5 | 2010-07-10 21:36:55 |
| data |    9 | 2010-08-06 02:52:17 |
| bbey |   12 | 2010-08-09 09:01:26 |
+------+------+---------------------+

My question is is there a better way to do this, some optimization, improvement or maybe it's possible to make join?


You are using a correlated subquery. The subquery is dependent on the outer query, and therefore it has to be executed once for each row of the outer query.

In general, this can be improved by using your subquery as a derived table instead. Since a subquery as a derived table is not correlated to the outer query, this solution is considered more scalable:

SELECT    t1.`data`, t1.`type`, t1.`date`
FROM      `table1` t1
JOIN      (
              SELECT   MAX( `date`) `max_date`, `type`
              FROM     `table1`
              GROUP BY `type`
          ) der_t ON (der_t.`max_date` = t1.`date` AND der_t.`type` = t1.`type`)
GROUP BY  t1.`type`
ORDER BY  t1.`type`, t1.`date`;

Test case:

CREATE TABLE table1 (id int, data varchar(10), type int, date datetime); 

INSERT INTO table1 VALUES (1, 'just', 2, '2010-08-07 14:24:48');
INSERT INTO table1 VALUES (2, 'some', 2, '2010-08-07 18:07:32');
INSERT INTO table1 VALUES (3, 'data', 9, '2010-08-06 02:52:17');
INSERT INTO table1 VALUES (4, 'abcd', 1, '2010-08-08 17:23:22');
INSERT INTO table1 VALUES (5, 'efg1', 5, '2010-07-10 21:36:55');
INSERT INTO table1 VALUES (6, 'c123', 5, '2010-07-10 20:44:36');
INSERT INTO table1 VALUES (7, 'bbey', 12, '2010-08-09 09:01:26');

Result:

+------+------+---------------------+
| data | type | date                |
+------+------+---------------------+
| abcd |    1 | 2010-08-08 17:23:22 |
| some |    2 | 2010-08-07 18:07:32 |
| efg1 |    5 | 2010-07-10 21:36:55 |
| data |    9 | 2010-08-06 02:52:17 |
| bbey |   12 | 2010-08-09 09:01:26 |
+------+------+---------------------+
5 rows in set (0.00 sec)

It also looks like you can avoid subqueries altogether, by using a solution such as the follows:

SELECT     t1.`data`, t1.`type`, t1.`date`
FROM       `table1` t1
LEFT JOIN  `table1` t2 ON (t1.`date` < t2.`date` AND t1.`type` = t2.`type`)
WHERE      t2.`date` IS NULL
GROUP BY   t1.`type`
ORDER BY   t1.`type`, t1.`date`;

In general, this scales even better than the solution with the derived table, but if performance is paramount, you may want to measure both solutions. The article that @Naktibalda provided also provides a few other solutions that you may want to test.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜