mysql: how can I improve performance for this mysql select statement
I have created an sql table where I have indexed files on disk. There are over 1 million records on the table. I have added indexes for ext and size, but it still takes over a minute to execute this query which tells me the amount of space used by ext. How can I improve performance on this select?
select ext,
ROUND((sum(size) / (1073741824))) as TotalSizeGB,
count(*) as Count
from fileindex
group by ext
order by TotalSizeGB desc;
Explain output:
|| *id* || *select_type* || *table* || *type* || *po开发者_JAVA百科ssible_keys* || *key* || *key_len* || *ref* || *rows* || *Extra* ||
|| 1 || SIMPLE || fileindex || index || _NULL_ || ext || 27 || _NULL_ || 1892234 || Using index; Using temporary; Using filesort ||
The query as written is always going to hit every row in the table - so there really is a limit to how quickly it can perform. If you really want this result to be something returned quickly, you might want to add another table to keep the total size of each ext, and update it with triggers whenever an operation takes place on your main table.
Use MySQL Triggers so as rows are INSERT'ed into fileindex
, it does something like UPDATE meta SET value=value+NEW.size WHERE name='fileindex.count';
.
delimiter |
DROP TRIGGER fileindexafterinsert;|
CREATE TRIGGER fileindexafterinsert AFTER INSERT ON fileindex
FOR EACH ROW BEGIN
update meta set value=value+NEW.size where name=CONCAT('fileindex.',NEW.ext);
END;
|
DROP TRIGGER fileindexafterdelete;|
CREATE TRIGGER fileindexafterdelete AFTER DELETE ON fileindex
FOR EACH ROW BEGIN
update meta set value=value-OLD.size where name=CONCAT('fileindex.',OLD.ext);
END;
|
Then you simply can do SELECT * FROM meta WHERE name='fileindex.exe' LIMIT 1
which should return in less than 0.01s.
As I can't see any obvious flaw with your MySQL syntax, if you want it faster than that, I would suggest going NoSQL, and using a document database that supports Map-Reduce such as Hadoop or CouchDB. You could host this on a cluster (read: hundreds) of machines on EC2 (ok, I'm joking, but seriously you can run 1 node per CPU core for max-speed on 1 box).
Your query is going to hit each record of the table, therefore you don't want to slow things down by first hitting an index and then the table, as obviously this will lead to 2 IOs per record from the table (1 for the index and 1 for the actual table data).
Therefore the first question becomes how can you speed a full table scan?
Tune IO. Are your disks fast, defragged, not shared (with other data, applications, etc), etc.
ALternately, consider denormalisation; e.g. a trigger on your table that counts and sums the appropriate data with each insert, update and deletion and stores this value in another table. Then query for the single row of data in this other table.
Add a covering index which will basically have all the columns you need in memory.
I would recommend: alter table fileindex add index covering (ext,TotalSizeGB, size)
Should work well. (hopefully)
精彩评论