Use ORDER BY 'x' with a JOIN, but keep rows that don't have a value for 'x'
This is simplified version of a relatively complex problem that myself and my colleagues can't quite get our heads around.
Consider two tables, table_a
and table_b
. In our CMS table_a
holds metadata for all the data stored in the database, and table_b
has some more specific information, so for simplicity's sake, a title
and date
column.
At the moment our query looks like:
SELECT *
FROM `table_a` LEFT OUTER JOIN `table_b` ON (table_a.id = table_b.id)
WHERE table_a.col = 'value'
ORDER BY table_b.date ASC
LIMIT 0,20
This degrades badly when table_a
has a large amount of rows. If the JOIN is changed RIGHT OUTER JOIN
(which triggers MySQL to use the INDEX set on table_b.date
), the query is infinitely quicker, but it doesn't produce the same result开发者_C百科s (because if table_b.date
doesn't have a value, it is ignored).
This becomes an issue in our CMS because if the user sorts on the date column, any rows that don't have a date set yet disappear from the interface, creating a confusing UI experience and makes it difficult to add dates for the rows that missing them.
Is there a solution that will:
- Use
table_b
.date's INDEX so that the query will scale better - Somehow retain those rows in
table_b
that don't have adate
set so that a user can enter the data
I'm going to second ArtoAle's comment. since the order by
applies to a null value in the outer join for missing rows in table_b
, those rows will be out of order anyway.
The simulated outer join is the ugly part, so lets look at that first. Mysql doesn't have except
, so you need to write the query in terms of exists
.
SELECT table_a.col1, table_a.col2, table_a.col3, ... NULL as table_b_col1, NULL as ...
FROM
table_a
WHERE
NOT EXISTS (SELECT 1 FROM table_a INNER JOIN table_b ON table_a.id = table_b.id);
Which should be UNION ALL
ed with the original query as an inner join. The UNION_ALL
is needed to preserve the original order.
This sort of query is probably going to be dog-slow no matter what you do, because there won't be an index that readily supports a "Foreign Key not present" sort of query. This basically boils down to an index scan in table_a.id with a lookup (Or maybe a parallel scan) for the corresponding row in table_b.id.
So we ended up implemented a different solution that while the results were not as good as using an INDEX, it still provided a nice speed boost of around 25%.
We remove the JOIN and instead used an ORDER BY subquery:
SELECT *
FROM `table_a`
WHERE table_a.col = 'value'
ORDER BY (
SELECT date
FROM table_b
WHERE id = table_a.id
) ASC
LIMIT 0,20
精彩评论