MySQL: Optimizing COUNT(*) and GROUP BY

2023-03-20 05:15 问答作者：

I have a simple MyISAM table resembling the following (trimmed for readability -- in reality, there are more columns, all of which are constant width and some of which are nullable):

CREATE TABLE IF NOT EXISTS `history` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `time` int(11) NOT NULL,
  `event` int(11) NOT NULL,
  `source` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `event` (`event`),
  KEY `time` (`time`),
);

Presently the table contains only about 6,000,000 rows (of which currently about 160,000 match the query below), but this is expected to increase. Given a particular event ID and grouped by source, I want to know how many events with that ID were logged during a particular interval of time. The answer to the query might be something along the lines of "Today, event X happened 120 times for source A, 105 times for source B, and 900 times for source C."

The query I concocted does perform this task, but it performs monstrously badly, taking well over a minute to execute when the timespan is set to "all time" and in excess of 30 seconds for as little as a week back:

SELECT COUNT(*) AS count FROM history
WHERE event=2000 AND time >= 0 AND time < 1310563644
GROUP BY source
ORDER BY count DESC

This is not for real-time use, so even if the query takes a second or two that would b开发者_运维问答e fine, but several minutes is not. Explaining the query gives the following, which troubles me for obvious reasons:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE          history ref     event,time      event   4           const   160399  Using where; Using temporary; Using filesort

I've experimented with various multi-column indexes (such as (event, time)), but with no improvement. This seems like such a common use case that I can't imagine there not being a reasonable solution, but my Googling all boil down to versions of the query I already have, with no particular suggestions on how to avoid the temporary (and even then, why performance is so abysmal).

Any suggestions?

You say you have tried multi-column indexes. Have you also tried single-column indexes, one per column?

UPDATE: Also, the COUNT(*) operation over a GROUP BY clause is probably a lot faster, if the grouped column also has an index on it... Of course, this depends on the number of NULL values that are actually in that column, which are not indexed.

For event, MySQL can execute a UNIQUE SCAN, which is quite fast, whereas for time, a RANGE SCAN will be applied, which is not so fast... If you separate indexes, I'd expect better performance than with multi-column ones.

Also, maybe you could gain something by partitioning your table by some expected values / value ranges:

http://dev.mysql.com/doc/refman/5.5/en/partitioning-overview.html

I offer you to try this multi-column index:

ALTER TABLE `history` ADD INDEX `history_index` (`event` ASC, `time` ASC, `source` ASC);

Then if it doesn't help, try to force index on this query:

SELECT COUNT(*) AS count FROM history USE INDEX (history_index)
WHERE event=2000 AND time >= 0 AND time < 1310563644
GROUP BY source
ORDER BY count DESC

If the source are known or you want to find the count for specific source, then you can try like this.

select count(source= 'A' or NULL) as A,count(source= 'B' or NULL) as B from history; and for ordering you can do it in your application code. Also try with indexing event and source together.

This will be definitely faster than the older one.

继续阅读：query-optimization

MySQL: Optimizing COUNT(*) and GROUP BY

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？