开发者

Slow dependent subquery - how to improve performance?

My database has table name transactions which has 20000 records. When I run this query

SELECT T1.* FROM transactions AS T1
WHERE T1.ppno IN 
  (SELECT T2.PPNO FROM transactions AS T2 
   WHERE T2.ppno = T1.ppno 
   HAVING COUNT(T2.ppno) = $doublescount) 
 ORDER BY T1.ppno,T1.numb

it takes at least 3 mins to run. How to speed up this query?

Edit

show create table transactions returns as

CREATE TABLE `transac开发者_如何学Gotions` (
  `eidx` int(10) unsigned NOT NULL,
  `numb` int(10) unsigned NOT NULL,
  `date` date NOT NULL,
  `time` varchar(45) NOT NULL,
  `name` varchar(45) NOT NULL,
  `add1` varchar(45) NOT NULL,
  `add2` varchar(45) NOT NULL,
  `city` varchar(45) NOT NULL,
  `phno` varchar(45) NOT NULL,
  `nati` varchar(45) NOT NULL,
  `ppno` varchar(45) NOT NULL,
  `cuam` varchar(45) NOT NULL,
  `tcam` varchar(45) NOT NULL,
  `valu` varchar(45) NOT NULL,
  `srch` varchar(45) NOT NULL,
  `stax` varchar(45) NOT NULL,
  `taxp` varchar(45) NOT NULL,
  `roun` varchar(45) NOT NULL,
  `amnt` varchar(45) NOT NULL,
  `encd` varchar(45) NOT NULL,
  `mocd` varchar(45) NOT NULL,
  `endt` varchar(45) NOT NULL,
  `modt` varchar(45) NOT NULL,
  `sflg` varchar(5) NOT NULL,
  `category` varchar(45) NOT NULL DEFAULT 'NA',
  `branch` varchar(10) NOT NULL,
  PRIMARY KEY (`numb`,`branch`,`date`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=FIXED


Instead of using IN condition, use LEFT JOIN NULL

http://explainextended.com/2010/05/27/left-join-is-null-vs-not-in-vs-not-exists-nullable-columns/


Subqueries are slow. Use a JOIN on a temporary table containing all the ppno's that satisfy the condition.

SELECT T1.* FROM transactions AS T1 
JOIN (SELECT DISTINCT T2.PPNO FROM transactions AS T2 HAVING COUNT(T2.ppno) = $doublescount) AS temp ON temp.PPNO=T1.ppno
ORDER BY T1.ppno,T1.numb


change the SELECT T1.* FROM transactions AS T1 so it fetches only the columns you need e.g SELECT T1.ppno, T1.name FROM transactions as T1 and then use the join method courtesy of Gerben.

when SELECT * is used, the database system has to work out what columns are in the database, and then allocate memory for each column and row - quite a lot of background work before the query is run. By using named columns, the database system only needs to check those columns - less background work before running the query.

If your query is taking 3mins to execute, you probably have a cartesian join happening, where the result set could potentially run into the millions. gerben's join method prevents this by making a temporary table consisting of the subquery result appended to the main table, the main query is run against this temporary table to produce a much smaller result set.


If the bottleneck is the database, you can also leave out the ORDER BY and do the ordering in the application instead of the DB.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜