开发者

Slow mysql query / Possible query problem

I have 2 (similar) queries:

-- Query #1 - get all new products not in currently in the Product table
-- Should match any products in the temp table that do not exist in the Product table
INSERT
  INTO `tmpProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT t.`ProductId`, t.`ProcessedOn`, \'Activated\'
  FROM `tmpImport` t
  LEFT JOIN `Product` p USING (`ProductId`)
 WHERE p.`ProductId` IS NULL
    ON DUPLICATE KEY UPDATE
       `State` = VALUES(`State`)

-- Query #2 - get all Products that are removed from the Product table
-- Should match any products in the Product table that do not exist in the temp table
INSERT
  INTO `tmpProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT p.`ProductId`, p.`LastSeenDate`, \'Deactivated\'
  FROM `Product` p
  LEFT JOIN `tmpImport` t USING (`ProductId`)
 WHERE t.`ProductId` IS NULL
    ON DUPLICATE KEY UPDATE
       `State` = VALUES(`State`)

On first run, (day 1), when the Product table is empty, both queries run in under 1 second, however, on second run, (day 2), when the Product table has 14000 records, the first query runs in under 2 seconds, and the second query runs in 244 seconds. Each consecutive data import is the same (240-250 seconds for query # 2. On checking the database, all data seems correct, I just can't figure out why the second query is taking so long.

---> Edit: Slow Query log:

# Query_time: 245.328784  Lock_time: 0.000000 Rows_sent: 0  Rows_examined: 187711973
SET timestamp=1305151558;

INSERT
  INTO `tmpProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT p.`ProductId`, p.`LastSeenDate`, 'Deactivated'
  FROM `Product` p
  LEFT JOIN `tmpImport` t USING (`ProductId`)
 WHERE t.`ProductId` IS NULL
    ON DUPLICATE KEY UPDATE
       `State` = VALUES(`State`);

What concerns me most at this point: Rows_examined: 187711973 (how exactly is it examining that many rows?) The Product table size is ~14000 records, and the import table size is ~28000 records, and the tmpProductState is at most 60 records.

---> Another Edit:

EXPLAIN SELECT p.`ProductId` , p.`LastSeenDate` , 'Deactivated'
FROM `Product` p
LEFT JOIN `tmpImport` t
USING ( `ProductId` )
WHERE t.`ProductId` IS NULL 

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE          p       ALL     NULL            NULL    NULL            NULL    14151
1   SIMPLE          t       index   NULL            PRIMARY 100             NULL    28166   Using where; Using index; Not exists

Tables Involved:

CREATE TABLE IF NOT EXISTS `tmpImport` (
  `CategoryId`             smallint(5) unsigned NOT NULL,
  `ProcessedOn`            date DEFAULT NULL,
  `ProductId`              varchar(32) NOT NULL,
  `Title`        开发者_如何学运维          varchar(255) DEFAULT NULL,
  `Description`            text,
  `ActivateDate`           date DEFAULT NULL,
  PRIMARY KEY (`CategoryId`,`ProductId`)
) ENGINE=MyISAM DEFAULT CHARSET = UTF8

CREATE TABLE IF NOT EXISTS `tmpProductState` (
  `ProductId` VARCHAR(32) NOT NULL,
  `ChangedOn` DATE NOT NULL,
  `State` ENUM(\'Activated\',\'Deactivated\'),
  PRIMARY KEY(`ProductId`,`ChangedOn`)
) ENGINE = Memory

CREATE TABLE `Product` (
  `ProductId` varchar(32) NOT NULL,
  `Title` varchar(255) DEFAULT NULL,
  `Description` text,
  `ActivateDate` date DEFAULT NULL,
  `LastSeenDate` date DEFAULT NULL,
  PRIMARY KEY (`ProductId`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8


Your tables are not normalized, you don't have effective indexes, and your joins are... unusual.

I assume you can't do anything about the gross duplication of data across tables, so lets ignore that.

Looks like you are duplicating data across columns in the tables you are joining, you should use all of those columns in your join. So it should probably be:

LEFT JOIN `tmpImport` t USING (`ProductId`, `Title`, `Description`, `ActivateDate`)

Add indexes to your tables that correspond to the fields you are joining or filtering on. DO NOT have a compound key as your primary key. Instead add an auto incrementing field to be the PK, and have a unique key if you need to enforce uniqueness. Both the product table and tmpImport should have keys for each column being joined.

Hope some of these ideas help you out.


Very late reply to this, but your first query is getting all the records from tmpImport and they getting all the records from Product using the primary key on the Product table. This is pretty efficient. The second query is getting all the records from Product and then getting the matching records from tmpImport but without the benefit of any index on ProductId on tmpImport. Hence runs badly.

Add an index on ProductId to the tmpImport table (the ProductId in the primary key is ignored for this join as it is not the first column in the key and you are not using CategoryId which is the first ).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜