Optimisation of volatile data querying
I'm trying to solve a problem with latency on a to a mysql-5.0 db.
- The query itself is extremely simple:
SELECT SUM(items) FROM tbl WHERE col = 'val'
- There's an index on
col
and there are not more than 10000 values to sum in the worst case (mean ofcount(items)
for all values ofcol
would be around 10). - The table has up to 2M rows.
- The query is run frequently enough that sometimes the execution time goes up to 10s, although 99% of them take << 1s
- The query is not really cachable - in almost every case, each query like this one will开发者_如何学JAVA be followed by an insert to that table in the next minute and showing old values is out of question (billing information).
- keys are good enough - ~100% hits
The result I'm looking for is every single query < 1s. Are there any ways to improve the select time without changes to the table? Alternatively, are there any interesting changes that would help to resolve the problem? I thought about simply having a table where the current sum is updated for every col right after every insert - but maybe there are better ways to do it?
Another approach is to add a summary table:
create table summary ( col varchar(10) primary key, items int not null );
and add some triggers to tbl so that:
on insert:
insert into summary values( new.col, new.items )
on duplicate key update set summary.items = summary.items + new.items;
on delete:
update summary set summary.items = summary.items - old.items where summary.col = old.col
on update:
update summary set summary.items = summary.items - old.items where summary.col = old.col;
update summary set summary.items = summary.items + new.items where summary.col = new.col;
This will slow down your inserts, but allow you to hit a single row in the summary table for
select items from summary where col = 'val';
The biggest problem with this is bootstrapping the values of the summary table. If you can take the application offline, you can easily initialise summary with values from tbl.
insert into summary select col, sum(items) from tbl group by col;
However, if you need to keep the service running, it is a lot more difficult. If you have a replica, you can stop replication, build the summary table, install the triggers, restart replication, then failover the service to using the replica, and then repeat the process on the retired primary.
If you cannot do that, then you could update the summary table one value of col at a time to reduce the impact:
lock table write tbl, summary;
delete from summary where col = 'val';
insert into summary select col, sum(items) from tbl where col = 'val';
unlock tables;
Or if you can tolerate a prolonged outage:
lock table write tbl, summary;
delete from summary;
insert into summary select col, sum(items) from tbl group by col;
unlock tables;
A covering index should help:
create index cix on tbl (col, items);
This will enable the sum to be performed without reading from the data file - which should be faster.
You should also track how effective your key-buffer is, and whether you need to allocate more memory for it. This can be done by polling the server status and watching the 'key%' values:
SHOW STATUS LIKE 'Key%';
MySQL Manual - show status
The ratio between key_read_requests (ie. the number of index lookups) versus key_reads (ie. number of requests that required index blocks to be read from disk) is important. The higher the number of disk reads, the slower the query will run. You can improvethis by increasing the keybuffer size in the config file.
精彩评论