Indexing with GROUP BY, ORDER BY, and GROUP_CONCAT
SOLVED SEE BELOW
I am trying to use both GROUP BY
and ORDER BY
in my query where I retrieve data sorted by difficulty. I have to use the GROUP BY
because of the GROUP CONCAT
since some tables such as the 'lookup_peripheral', link multiple values to the same key (content_id). I understand why MYSQL cannot use a index when performing this task since the GROUP BY and ORDER BY statements do not share the same field. However, 开发者_运维问答I am looking for alternative solutions that won't require a day to retrieve the results.
If I omit either the GROUP BY
or ORDER BY
clause, then the database uses an index, but the results lack either all of the peripherals or are not sorted by difficulty.
I am using the 'lookup_difficulty' table in the FROM
so I can use that index in ordering the results. The lookup_xxxxx
tables store each allowed value and then the other tables such as peripheral
link the submission to the value via the content_id. Everything is referenced to the submission content_id
. The content
table holds essential info such as member id, name, etc.
I apologize if my post is not clear enough.
mysql> describe peripheral;
+------------------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+----------+------+-----+---------+-------+
| peripheral_id | int(2) | NO | PRI | NULL | |
| peripheral | char(30) | NO | | NULL | |
| peripheral_total | int(5) | NO | | NULL | |
+------------------+----------+------+-----+---------+-------+
mysql> select * from peripheral;
+---------------+-----------------+------------------+
| peripheral_id | peripheral | peripheral_total |
+---------------+-----------------+------------------+
| 1 | periph 1 | 0 |
| 2 | periph 2 | 1 |
| 3 | periph 3 | 3 |
+---------------+-----------------+------------------+
:
mysql> describe lookup_peripheral;
+---------------+---------+------+------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------+------+------+---------+-------+
| content_id | int(10) | NO | INDEX| NULL | |
| peripheral_id | int(2) | NO | | NULL | |
+---------------+---------+------+------+---------+-------+
mysql> mysql> select * from lookup_peripheral;
+------------+---------------+
| content_id | peripheral_id |
+------------+---------------+
| 74 | 2 |
| 74 | 5 |
| 75 | 2 |
| 75 | 5 |
| 76 | 3 |
| 76 | 4 |
+------------+---------------+
The following is not using an index on lookup_difficulty, but rather a table sort and temporary table.
SELECT group_concat(DISTINCT peripheral.peripheral) as peripheral, content.member, .....
FROM (lookup_difficulty)
LEFT OUTER JOIN lookup_peripheral ON lookup_difficulty.content_id = lookup_peripheral.content_id
LEFT OUTER JOIN peripheral ON peripheral.peripheral_id = lookup_peripheral.peripheral_id
.....
LEFT OUTER JOIN programmer ON programmer.programmer_id = lookup_programmer.programmer_id
LEFT OUTER JOIN lookup_programming_language ON lookup_difficulty.content_id = lookup_programming_language.content_id
GROUP BY lookup_difficulty.content_id
ORDER BY lookup_dfficulty.difficulty_id
LIMIT 30
The ultimate goal is to retrieve results sorted by difficulty with the correct peripherals attached. I think I need a sub-query to achieve this.
EDIT: ANSWER BELOW:
Figured it out. I did what I suscpected I had to do, which was to add a sub-query. Since MYSQL can only use one index per table, I was unable to GROUP BY
and SORT BY
together for my particular setup. Instead, I added another query that would use another index on a different table to group the peripherals together. Here what I added in the SELECT
statement above:
(SELECT group_concat(DISTINCT peripheral.peripheral) as peripheral
FROM lookup_peripheral
LEFT OUTER JOIN peripheral ON peripheral.peripheral_id = lookup_peripheral.peripheral_id
WHERE lookup_difficulty.content_id = lookup_peripheral.content_id
GROUP BY lookup_peripheral.content_id
LIMIT 1) as peripheral
I used a LEFT OUTER
since some entries do not have any peripherals. Total query time is now .02s on a 400MHz processor with 128MB of 100Hz RAM for a 40k row database for most of the tables.
EXPLAIN
now gives me a USING INDEX
for the lookup_difficulty
table. I added this to achieve that:
ALTER TABLE `pictuts`.`lookup_difficulty` DROP PRIMARY KEY ,
ADD PRIMARY KEY ( `difficulty_id` , `content_id` )
Edit 2 I noticed that with large offsets by using pagination, the page will load considerably slower. You may have experienced this with other sites as well. Fortuatly, there is a way to avoid this as pointed out by Peter Zaitsev. Here is my updated snippet to achieve the same timings for offsets of 30K or 0:
FROM
SELECT lookup_difficulty.content_id, lookup_difficulty.difficulty_id
FROM lookup_difficulty
LIMIT '.$offset.', '.$per_page.'
) ld
Now just add ld.whatever
to every JOIN
made and there you have it! My query look like a total mess now, but at least it is optimized. I don't think anyone will make it this far in reading this...
Put in Justin's answer, so this question gets off the unanswered list:
Figured it out. I did what I suspected I had to do, which was to add a sub-query. Since MYSQL can only use one index per table, I was unable to GROUP BY and SORT BY together for my particular setup. Instead, I added another query that would use another index on a different table to group the peripherals together. Here what I added in the SELECT statement above:
(SELECT group_concat(DISTINCT p.peripheral) as peripheral
FROM lookup_peripheral lp
LEFT JOIN peripheral p ON p.peripheral_id = lp.peripheral_id
WHERE ld.content_id = lp.content_id
GROUP BY lp.content_id
LIMIT 1) as peripheral
I used a LEFT OUTER since some entries do not have any peripherals. Total query time is now .02s on a 400MHz processor with 128MB of 100Hz RAM for a 40k row database for most of the tables.
EXPLAIN now gives me a USING INDEX for the lookup_difficulty table. I added this to achieve that:
ALTER TABLE pictuts.lookup_difficulty DROP PRIMARY KEY ,
ADD PRIMARY KEY ( difficulty_id , content_id )
Edit 2 I noticed that with large offsets by using pagination, the page will load considerably slower. You may have experienced this with other sites as well. Fortunately, there is a way to avoid this as pointed out by Peter Zaitsev. Here is my updated snippet to achieve the same timings for offsets of 30K or 0:
FROM
SELECT ld.content_id, ld.difficulty_id
FROM lookup_difficulty ld
LIMIT '.$per_page.' OFFSET '.$offset.'
) ld
Now just add ld.whatever to every JOIN made and there you have it! My query look like a total mess now, but at least it is optimized. I don't think anyone will make it this far in reading this...
精彩评论