Slow mysql query / Possible query problem
I have 2 (similar) queries:
-- Query #1 - get all new products not in currently in the Product table
-- Should match any products in the temp table that do not exist in the Product table
INSERT
INTO `tmpProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT t.`ProductId`, t.`ProcessedOn`, \'Activated\'
FROM `tmpImport` t
LEFT JOIN `Product` p USING (`ProductId`)
WHERE p.`ProductId` IS NULL
ON DUPLICATE KEY UPDATE
`State` = VALUES(`State`)
-- Query #2 - get all Products that are removed from the Product table
-- Should match any products in the Product table that do not exist in the temp table
INSERT
INTO `tmpProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT p.`ProductId`, p.`LastSeenDate`, \'Deactivated\'
FROM `Product` p
LEFT JOIN `tmpImport` t USING (`ProductId`)
WHERE t.`ProductId` IS NULL
ON DUPLICATE KEY UPDATE
`State` = VALUES(`State`)
On first run, (day 1), when the Product table is empty, both queries run in under 1 second, however, on second run, (day 2), when the Product table has 14000 records, the first query runs in under 2 seconds, and the second query runs in 244 seconds. Each consecutive data import is the same (240-250 seconds for query # 2. On checking the database, all data seems correct, I just can't figure out why the second query is taking so long.
---> Edit: Slow Query log:
# Query_time: 245.328784 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 187711973
SET timestamp=1305151558;
INSERT
INTO `tmpProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT p.`ProductId`, p.`LastSeenDate`, 'Deactivated'
FROM `Product` p
LEFT JOIN `tmpImport` t USING (`ProductId`)
WHERE t.`ProductId` IS NULL
ON DUPLICATE KEY UPDATE
`State` = VALUES(`State`);
What concerns me most at this point: Rows_examined: 187711973 (how exactly is it examining that many rows?) The Product table size is ~14000 records, and the import table size is ~28000 records, and the tmpProductState is at most 60 records.
---> Another Edit:
EXPLAIN SELECT p.`ProductId` , p.`LastSeenDate` , 'Deactivated'
FROM `Product` p
LEFT JOIN `tmpImport` t
USING ( `ProductId` )
WHERE t.`ProductId` IS NULL
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE p ALL NULL NULL NULL NULL 14151
1 SIMPLE t index NULL PRIMARY 100 NULL 28166 Using where; Using index; Not exists
Tables Involved:
CREATE TABLE IF NOT EXISTS `tmpImport` (
`CategoryId` smallint(5) unsigned NOT NULL,
`ProcessedOn` date DEFAULT NULL,
`ProductId` varchar(32) NOT NULL,
`Title` 开发者_如何学运维 varchar(255) DEFAULT NULL,
`Description` text,
`ActivateDate` date DEFAULT NULL,
PRIMARY KEY (`CategoryId`,`ProductId`)
) ENGINE=MyISAM DEFAULT CHARSET = UTF8
CREATE TABLE IF NOT EXISTS `tmpProductState` (
`ProductId` VARCHAR(32) NOT NULL,
`ChangedOn` DATE NOT NULL,
`State` ENUM(\'Activated\',\'Deactivated\'),
PRIMARY KEY(`ProductId`,`ChangedOn`)
) ENGINE = Memory
CREATE TABLE `Product` (
`ProductId` varchar(32) NOT NULL,
`Title` varchar(255) DEFAULT NULL,
`Description` text,
`ActivateDate` date DEFAULT NULL,
`LastSeenDate` date DEFAULT NULL,
PRIMARY KEY (`ProductId`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
Your tables are not normalized, you don't have effective indexes, and your joins are... unusual.
I assume you can't do anything about the gross duplication of data across tables, so lets ignore that.
Looks like you are duplicating data across columns in the tables you are joining, you should use all of those columns in your join. So it should probably be:
LEFT JOIN `tmpImport` t USING (`ProductId`, `Title`, `Description`, `ActivateDate`)
Add indexes to your tables that correspond to the fields you are joining or filtering on. DO NOT have a compound key as your primary key. Instead add an auto incrementing field to be the PK, and have a unique key if you need to enforce uniqueness. Both the product table and tmpImport should have keys for each column being joined.
Hope some of these ideas help you out.
Very late reply to this, but your first query is getting all the records from tmpImport and they getting all the records from Product using the primary key on the Product table. This is pretty efficient. The second query is getting all the records from Product and then getting the matching records from tmpImport but without the benefit of any index on ProductId on tmpImport. Hence runs badly.
Add an index on ProductId to the tmpImport table (the ProductId in the primary key is ignored for this join as it is not the first column in the key and you are not using CategoryId which is the first ).
精彩评论