MySQL Query, Subquery optimization, SELECT, JOIN
I have one table with some data and I want select newest data for each type...
Table:
+----+------+------+---------------------+
| ID | data | type | date |
+----+------+------+---------------------+
| 1 | just | 2 | 2010-08-07 14:开发者_开发问答24:48 |
| 2 | some | 2 | 2010-08-07 18:07:32 |
| 3 | data | 9 | 2010-08-06 02:52:17 |
| 4 | abcd | 1 | 2010-08-08 17:23:22 |
| 5 | efg1 | 5 | 2010-07-10 21:36:55 |
| 6 | c123 | 5 | 2010-07-10 20:44:36 |
| 7 | bbey | 12 | 2010-08-09 09:01:26 |
+----+------+------+---------------------+
Currently I'm using simple subquery and looks like everything works
SELECT `data`,`type`,`date`
FROM `table1`
WHERE `date` = (
SELECT MAX( `date` )
FROM `table1` AS tbl2
WHERE tbl2.`type` = `table1`.`type`
)
GROUP BY `type`
ORDER BY `type`,`date`
Result:
+------+------+---------------------+
| data | type | date |
+------+------+---------------------+
| abcd | 1 | 2010-08-08 17:23:22 |
| some | 2 | 2010-08-07 18:07:32 |
| efg1 | 5 | 2010-07-10 21:36:55 |
| data | 9 | 2010-08-06 02:52:17 |
| bbey | 12 | 2010-08-09 09:01:26 |
+------+------+---------------------+
My question is is there a better way to do this, some optimization, improvement or maybe it's possible to make join?
You are using a correlated subquery. The subquery is dependent on the outer query, and therefore it has to be executed once for each row of the outer query.
In general, this can be improved by using your subquery as a derived table instead. Since a subquery as a derived table is not correlated to the outer query, this solution is considered more scalable:
SELECT t1.`data`, t1.`type`, t1.`date`
FROM `table1` t1
JOIN (
SELECT MAX( `date`) `max_date`, `type`
FROM `table1`
GROUP BY `type`
) der_t ON (der_t.`max_date` = t1.`date` AND der_t.`type` = t1.`type`)
GROUP BY t1.`type`
ORDER BY t1.`type`, t1.`date`;
Test case:
CREATE TABLE table1 (id int, data varchar(10), type int, date datetime);
INSERT INTO table1 VALUES (1, 'just', 2, '2010-08-07 14:24:48');
INSERT INTO table1 VALUES (2, 'some', 2, '2010-08-07 18:07:32');
INSERT INTO table1 VALUES (3, 'data', 9, '2010-08-06 02:52:17');
INSERT INTO table1 VALUES (4, 'abcd', 1, '2010-08-08 17:23:22');
INSERT INTO table1 VALUES (5, 'efg1', 5, '2010-07-10 21:36:55');
INSERT INTO table1 VALUES (6, 'c123', 5, '2010-07-10 20:44:36');
INSERT INTO table1 VALUES (7, 'bbey', 12, '2010-08-09 09:01:26');
Result:
+------+------+---------------------+
| data | type | date |
+------+------+---------------------+
| abcd | 1 | 2010-08-08 17:23:22 |
| some | 2 | 2010-08-07 18:07:32 |
| efg1 | 5 | 2010-07-10 21:36:55 |
| data | 9 | 2010-08-06 02:52:17 |
| bbey | 12 | 2010-08-09 09:01:26 |
+------+------+---------------------+
5 rows in set (0.00 sec)
It also looks like you can avoid subqueries altogether, by using a solution such as the follows:
SELECT t1.`data`, t1.`type`, t1.`date`
FROM `table1` t1
LEFT JOIN `table1` t2 ON (t1.`date` < t2.`date` AND t1.`type` = t2.`type`)
WHERE t2.`date` IS NULL
GROUP BY t1.`type`
ORDER BY t1.`type`, t1.`date`;
In general, this scales even better than the solution with the derived table, but if performance is paramount, you may want to measure both solutions. The article that @Naktibalda provided also provides a few other solutions that you may want to test.
精彩评论