Speed up MySQL query containing 300k+ records
I need to lookup all my products (sku's) their latest stock quantity.
I have one table (called "stock") with 315k+ records containing this information (a new batch of data is added every day, for most sku's). The reference data is in another table (called "stockfile").This is the query to do it:
SELECT s1 . * , f1 . *
FROM stock s1
JOIN stockfile f1 ON ( s1.stockfileid = f1.stockfileid )
LEFT OUTER JOIN ( stock s2
JOIN stockfile f2 ON ( s2.stockfileid = f2.stockfileid )
) ON ( s1.sku = s2.sku
AND ( f1.date < f2.date
OR f1.date = f2.date
AND f1.stockfileid < f2.stockfileid) )
WHERE s2.sku IS NULL
These are the table definitions
SHOW CREATE TABLE
stock:
CREATE TABLE `stock` (
`stockid` bigint(20) NOT NULL AUTO_INCREMENT,
`sku` char(25) NOT NULL,
`quantity` int(5) NOT NULL,
`creationdate` datetime NOT NULL,
`stockfileid` smallint(5) unsigned NOT NULL,
`touchdate` datetime NOT NULL,
PRIMARY KEY (`stockid`),
KEY `stock_sku` (`sku`),
KEY `stock_stockfileid` (`stockfileid`)
) ENGINE=MyISAM AUTO_INCREMENT=316039 DEFAULT CHARSET=latin1
SHOW CREATE TABLE
stockfile:
CREATE TABLE `stockfile` (
`stockfileid` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`filename` varchar(25) NOT NULL,
`creationdate` datetime DEFAULT NULL,
`touchdate` datetime DEFAULT NULL,
`date` datetime DEFAULT NULL,
`begindate` datetime DEFAULT NULL,
`enddate` datetime DEFAULT NULL,
PRIMARY KEY (`stockfileid`),
KEY `stockfile_date` (`date`)
) ENGINE=MyISAM AUTO_INCREMENT=266 DEFAULT CHARSET=latin1
Without any extra indexes it takes... forever. I added these and it sped up to about 250 seconds:
CREATE INDEX stock_sku ON stock(sku);
CREATE INDEX stock_stockfileid ON stock(stockfileid);
CREATE INDEX stockfile_date ON stockfile(date);
This is the EXPLAIN
on the original query, with these indexes.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE 开发者_开发问答 s1 ALL stock_stockfileid NULL NULL NULL 316038
1 SIMPLE f1 eq_ref PRIMARY PRIMARY 2 kompare.s1.stockfileid 1
1 SIMPLE s2 ref stock_sku,stock_stockfileid stock_sku 25 kompare.s1.sku 12 Using where
1 SIMPLE f2 eq_ref PRIMARY,stockfile_date PRIMARY 2 kompare.s2.stockfileid 1
Is there another way to speed things up?
- Thanks to Bill Karwin for solving the original query!
I'm not sure I got your query right, but if it's safe to suppose that maximal date has also a maximal stockfileid (like your OR condition half-suggests) maybe something like this query would be of help:
SELECT s1.*, f1.*
FROM
stock s1 JOIN stockfile f1 USING (stockfileid)
JOIN (
SELECT sku, max(date) AS maxdate, max(stockfileid) AS maxfileid
FROM stock JOIN stockfile USING (stockfileid)
GROUP BY sku
) AS dfi ON (s1.sku,f1.date,f1.stockfileid)=(dfi.sku,maxdate,maxfileid);
Not sure whether this is what you want and whether it's faster, but it should be. On the other hand, you don't need to take date into account at all, if fileid has it all. Anyway, I think this kind prefiltering may help as a starting point.
The default values in my.cnf typically are set for systems with VERY little memory by today's standards. If you are using those default values, that may be the single best place to look for performance gains. Ensure you are allocating all of the memory you can spare to MySQL.
mysqltuner can make good starting recommendations for allocating memory between the various parts of MySQL that can use it.
If you created your indices before adding most of the data, you may see vast improvement by performing ANALYZE TABLE on your tables. I saw one query drop from 24 seconds to 1 second just by doing that.
Your EXPLAIN indicates that MySQL is doing a table scan to satisfy WHERE s2.sku IS NULL
prior to narrowing the search. That's very expensive.
f1.date < f2.date
OR f1.date = f2.date
should be able to be re-written as
f1.date <= f2.date
though I doubt that matters to the optimizer.
Could you explain in plain English what you are trying to do with the query? That might help shed light on how it could be simplified.
Normalization of the data would go far in speeding up the queries, furthermore, if you are running on a slow machine this will adversely affect the speed of in which your results will be returned. Show me an example query to this table then I can better understand what your attempt is on that angle.
I'm not sure if this is something you could do with your app, but instead of computing the quantity for each sku every single time you run the query, it would be more efficient to store the sku and quantity in a separate table and then just update the data whenever a new stockfile is received. That way you incur the cost of calculating this once per scorefile and not once per query. It's a bit of an upfront cost to compute this but it saves you a lot down the line.
精彩评论