开发者

MySQL join with sort on datetime in joined table

I have 2 large mysql tables: Articles and ArticleTopics. I want to query the DB and retrieve the last 30 articles published for a given topicID. My current query is rather slow. Any ideas on how to improve it?

More details:

The tables:

Articles (~1 million rows)
+-----------+--------------+------+-----+---------+----------------+
| Field     | Type         | Null | Key | Default | Extra          |
+-----------+--------------+------+-----+---------+----------------+
| articleId | int(11)      | NO   | PRI | NULL    | auto_increment | 
| title     | varchar(255) | NO   |     | NULL    |                | 
| content   | longtext     | NO   |     | NULL    |                | 
| pubDate   | datetime     | NO   | MUL | NULL    |                | 
+-----------+--------------+------+-----+---------+----------------+


ArticleTopics (~10 million rows)
+-----------+--------------+------+-----+---------+-------+
| Field     | Type         | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| articleId | int(11)      | NO   | MUL | NULL    |       | 
| topicId   | int(11)      | NO   | MUL | NULL    |       | 
+-----------+--------------+------+-----+---------+-------+

And my query:

SELECT a.articleId, a.pubDate 
FROM Articles a, ArticleTopics t 
WHERE t.articleId=a.articleId AND t.topicId=3364 
ORDER BY a.pubDate DESC LIMIT 30;

And the EXPLAIN of the query:

+----+-------------+-------+--------+-------------------------------------+-------------------+---------+-------------------+------+----------------------------------------------+
| id | select_type | table | type   | possible_keys                开发者_JAVA百科       | key               | key_len | ref               | rows | Extra                                        |
+----+-------------+-------+--------+-------------------------------------+-------------------+---------+-------------------+------+----------------------------------------------+
|  1 | SIMPLE      | t     | ref    | articleId,topicId,topicId_articleId | topicId_articleId | 4       | const             | 4281 | Using index; Using temporary; Using filesort | 
|  1 | SIMPLE      | a     | eq_ref | PRIMARY,articleId_pubDate           | PRIMARY           | 4       | t.articleId       |    1 |                                              | 
+----+-------------+-------+--------+-------------------------------------+-------------------+---------+-------------------+------+----------------------------------------------+

The slowness, I believe, is coming from the ORDER BY a.pubDate DESC. I can greatly improve performance by faking it a bit by instead doing an ORDER BY t.articleId DESC and having an index in ArticleTopics on both articleId & topicId, since in general, the articleIds are in the same order as pubDates. They are not always, however, so it's not ideal. I'd like to be able to sort it on the pubDate.

Update: Added EXPLAIN.


You can rewrite the query in various ways to see if it speeds things up:

SELECT a.articleId, a.pubDate 
FROM Articles a
WHERE a.articleId in (
    select articleId 
    from ArticleTopics
    where topicId = 3364 
)
ORDER BY a.pubDate DESC LIMIT 30;

Or:

SELECT a.articleId, a.pubDate 
FROM Articles a
INNER JOIN ArticleTopics t ON t.articleId = a.articleId
WHERE t.topicId = 3364 
ORDER BY a.pubDate DESC LIMIT 30;

The important index for both queries is on Articles, and contains articleId as first field.

If article is a large table, with say the entire PDF in binary, you can create an index that fully covers the query. Full coverage means all selected fields are part of the index. For this query, a fully covering index would be (articleId, pubDate).


At this point, do you have an index on topicId? If so, does the index contain only the topicId field?

And maybe you can post the output of the EXPLAIN query.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜