Optimize GROUP BY after ranged index query

2023-02-01 07:02 问答作者：

I have a content application that needs to count responses in a time slice, then order them by number of responses. It currently works great with a small data set, but needs to scale to millions rows. My current query won't work.

mysql> describe Responses;
+---------------+---------------------+------+-----+---------+-------+
| Field         | Type                | Null | Key | Default | Extra |
+---------------+---------------------+------+-----+---------+-------+
| site_id       | int(10) unsigned    | NO   | MUL | NULL    |       |
| content_id    | bigint(20) unsigned | NO   | PRI | NULL    |       |
| response_id   | bigint(20) unsigned | NO   | PRI | NULL    |       |
| date          | int(10) unsigned    | NO   |     | NULL    |       |
+---------------+---------------------+------+-----+---------+-------+

The table type is InnoDB, the primary key is on (content_id, response_id). There is an additional index on (content_id, date) used to find responses to a piece of content, and another additional index on (site_id, date) used in the query I am have trouble with:

mysql> explain SELECT content_id id, COUNT(response_id) num_responses
               FROM Responses
               WHERE site_id = 1
                 AND date > 1234567890
                 AND date < 1293579867
               GROUP BY content_id
               ORDER BY num_responses DESC
               LIMIT 0, 10;
+----+-------------+-----------+-------+---------------+------+---------+------+------+-----------------------------------------------------------+
| id | select_type | table     | type  | possible_keys | key  | key_len | ref  | rows | Extra                                                     |
+----+-------------+-----------+-------+---------------+------+---------+------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | Responses | range | date          | date | 8       | NULL |  102 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-----------+-------+---------------+------+---------+------+------+-----------------------------------------------------------+

That's the best I've been able to come up with, but it will end up being in th开发者_如何学运维e 1,000,000's of rows needing to be counted, resulting in 10,000's of rows to sort, to pull in a handful of results.

I can't think of a way to precalculate the count either, as the date range is arbitrary. I have some liberty with changing the primary key: it can be composed of content_id, response_id, and site_id in any order, but cannot contain date.

The application is developed mostly in PHP, so if there is an quicker way to accomplish the same results by splitting the query into subqueries, using temporary tables, or doing things on the application side, I'm open to suggestions.

(Reposted from comments by request)

Set up a table that has three columns: id, date, and num_responses. The column num_responses consists of the number of responses for the given id on the given date. Backfill the table appropriately, and then at around midnight (or later) each night, run a script that adds a new row for the previous day.

Then, to get the rows you want, you can merely query the table mentioned above.

Rather than calculating each time, how about cache the calculated count since the last query, and add the increment of count to update the cache by putting date condition into the WHERE clause?

Have you considered partitioning the table by date? Are there any indices on the table?

继续阅读：group-by optimization sql

Optimize GROUP BY after ranged index query

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？