Optimisation of volatile data querying

2022-12-21 07:17 问答作者：

I'm trying to solve a problem with latency on a to a mysql-5.0 db.

The query itself is extremely simple: SELECT SUM(items) FROM tbl WHERE col = 'val'
There's an index on col and there are not more than 10000 values to sum in the worst case (mean of count(items) for all values of col would be around 10).
The table has up to 2M rows.
The query is run frequently enough that sometimes the execution time goes up to 10s, although 99% of them take << 1s
The query is not really cachable - in almost every case, each query like this one will开发者_如何学JAVA be followed by an insert to that table in the next minute and showing old values is out of question (billing information).
keys are good enough - ~100% hits

The result I'm looking for is every single query < 1s. Are there any ways to improve the select time without changes to the table? Alternatively, are there any interesting changes that would help to resolve the problem? I thought about simply having a table where the current sum is updated for every col right after every insert - but maybe there are better ways to do it?

Another approach is to add a summary table:

create table summary ( col varchar(10) primary key, items int not null );

and add some triggers to tbl so that:

on insert:

insert into summary values( new.col, new.items ) 
on duplicate key update set summary.items = summary.items + new.items;

on delete:

update summary set summary.items = summary.items - old.items where summary.col = old.col

on update:

update summary set summary.items = summary.items - old.items where summary.col = old.col;
update summary set summary.items = summary.items + new.items where summary.col = new.col;

This will slow down your inserts, but allow you to hit a single row in the summary table for

select items from summary where col = 'val';

The biggest problem with this is bootstrapping the values of the summary table. If you can take the application offline, you can easily initialise summary with values from tbl.

insert into summary select col, sum(items) from tbl group by col;

However, if you need to keep the service running, it is a lot more difficult. If you have a replica, you can stop replication, build the summary table, install the triggers, restart replication, then failover the service to using the replica, and then repeat the process on the retired primary.

If you cannot do that, then you could update the summary table one value of col at a time to reduce the impact:

lock table write tbl, summary; 
delete from summary where col = 'val';
insert into summary select col, sum(items) from tbl where col = 'val';
unlock tables;

Or if you can tolerate a prolonged outage:

lock table write tbl, summary;
delete from summary;
insert into summary select col, sum(items) from tbl group by col;
unlock tables;

A covering index should help:

create index cix on tbl (col, items);

This will enable the sum to be performed without reading from the data file - which should be faster.

You should also track how effective your key-buffer is, and whether you need to allocate more memory for it. This can be done by polling the server status and watching the 'key%' values:

SHOW STATUS LIKE 'Key%';

MySQL Manual - show status

The ratio between key_read_requests (ie. the number of index lookups) versus key_reads (ie. number of requests that required index blocks to be read from disk) is important. The higher the number of disk reads, the slower the query will run. You can improvethis by increasing the keybuffer size in the config file.

继续阅读：optimization

Optimisation of volatile data querying

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？