开发者

How do you approach optimizing this complex sql query then choosing proper table index's

I have a complex query that I can not seem to optimize with proper index's. Any ideas on how to modify the query or index's to optimize. The columndata a table has 55000 records, the drows d has 4000 records the rest of the tables less than 25 records. How should you approach optimizing a problem like this:

SELECT DISTINCT a.id,
       a.string_data,
       b.grid_id, 
       c.data_type,    
       d.document_id
  FROM  `columndata` a, 
        `columngrid` b, 
        `dcolumns` c, 
        `drows` d  
  WHERE b.grid_id = 9 
    AND d.document_id = 17
    AND d.id = a.row_number   
    AND b.column_id = a.column_id
    AND c.id = a.column_id     
    AND 0 = (SELECT count(1) AS q
               FROM `security` e, 
                    `role_userlist` f,
                    `user_type_defaults`g                                        
              WHERE ((e.access_for = 1 
                AND e.access_for_id = 0)
                 OR (e.access_for = 2
                AND e.access_for_id = f.role_id
                AND f.userid = 0)
                 OR (e.access_for = 3
                 AND e.access_for_id = g.id
                 AND (g.usertype_name =""
                  OR (g.usertype_name = "Guest"
                 AND 0 = 0)))) 
                 AND e.access_level = 0
                 AND ((e.access_type = 2
                 AND e.access_subtype_grid_id = b.grid_id
                 AND e.access_subtype_column_id = a.column_id)                                                          
                  OR  (e.access_type = 4 
                 AND e.access_subtype_document_id = a.document_id
                 AND e.access_subtype_column_id = a.column_id))) 
ORDER BY d.ordering, b.ordering LIMIT 0, 330

Tables

CREATE TABLE `columndata` (  
  `id` int(11) NOT NULL auto_increment,  
  `document_id` int (11) NOT NULL,  
  `column_id` int(11) NOT NULL,  
  `row_number` int开发者_Python百科(11) NOT NULL,  
  `string_data` varchar (5000),  
  PRIMARY KEY  (`id`),  
  INDEX(`column_id`,`row_number`,`document_id`),  
  INDEX(`row_number`),  
  INDEX(`document_id`)  
) ENGINE=MyISAM AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;  

CREATE TABLE `columngrid` (  
  `id` int(11) NOT NULL auto_increment,  
  `parent_id` int(11),  
  `column_id` int(11)  NOT NULL,  
  `grid_id` int(11) NOT NULL,  
  `ordering` int(11) NOT NULL,  
  PRIMARY KEY  (`id`),  
  INDEX (`parent_id`),  
  INDEX (`grid_id`,`column_id`,`ordering`)  
) ENGINE=MyISAM AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;  

CREATE TABLE `dcolumns` (  
  `id` int(11) NOT NULL auto_increment,  
  `header` varchar(25) NOT NULL,  
  `data_type` varchar (25) NOT NULL default 'T',                                                         
  PRIMARY KEY  (`id`)  
) ENGINE=MyISAM AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;  

CREATE TABLE `drows` (  
  `id` int(11) NOT NULL auto_increment,  
  `parent_id` int(11),  
  `document_id` int (11) NOT NULL,  
  `grid_id` int(11) NOT NULL,  
  `ordering` int(11) NOT NULL,  
  PRIMARY KEY  (`id`),  
  INDEX (`parent_id`),  
  INDEX (`document_id`,`id`,`ordering`)  
) ENGINE=MyISAM AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;  

CREATE TABLE `security` (    
  `id` int(11) NOT NULL auto_increment,    
  `access_for` int(11) NOT NULL,    
  `access_for_id` int(11) NOT NULL,  
  `access_type` int(11) NOT NULL,  
  `access_type_id` varchar(11) NOT NULL,  
  `access_subtype_grid_id` int(11) NULL,  
  `access_subtype_column_id` int(11) NULL,  
  `access_subtype_document_id` int(11) NULL,  
  `access_level` int(4) default 0,  
  PRIMARY KEY  (`id`),  
  INDEX `ind1` (`access_for`,`access_for_id`),  
  INDEX `ind2` (`access_type`,`access_type_id`),  
  INDEX `ind3` (`access_type`,`access_subtype_grid_id`,`access_subtype_column_id`),  
  INDEX `ind4` (`access_type`,`access_subtype_document_id`,`access_subtype_column_id`) 
) ENGINE=MyISAM AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;  

CREATE TABLE `role_userlist` (  
  `id` int(11) NOT NULL auto_increment,  
  `userid` int(11) NOT NULL,  
  `role_id` int(11) NOT NULL,  
  `userid_assigning_role` int(11) NOT NULL,  
  PRIMARY KEY  (`id`),  
  INDEX (`role_id`),  
  INDEX (`userid`)  
) ENGINE=MyISAM AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;  

CREATE TABLE `#__jgrid_user_type_defaults` (  
  `id` int(11) NOT NULL auto_increment,  
  `usertype_name` varchar(25) NOT NULL,  
  `access_level` int(11),  
  PRIMARY KEY  (`id`),  
  INDEX `ind1` (`usertype_name`)  
) ENGINE=MyISAM AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;  


Have you tried to use table JOINS instead of writing endless where statements? It is not always convenient to fetch everything in single query, as it can decrees overall performance.

for example, I had complex search query which was looking for value in 7 different tables written as single query, it took about 5 seconds to search through 20000 records. After splitting it to 7 smaller queries overall time for such queries was about 0.2 second.


When optimizing a query, first look at the query execution plan that your database engine generates.

Then look first at how many rows you are forcing the poor database to execute, the FROM clause:

FROM  `columndata` a, `columngrid` b, `dcolumns` c, `drows` d

specifies a join of four tables. Say, a=55,000 rows, b=25 rows, c=25 rows, d=4,000 rows. Do the math. That is 137 BILLION rows in the expanded set. If you are not ultra careful about pruning the rows aggressively and early, you'll end up with a very slow-running query.

Then look at your WHERE clause. Without looking at the last clause (which is a large subquery), it seems that you are restricting b to a few rows, and d perhaps to a few hundred rows based on the document_id. So you're looking at roughly 55,000 x 5 (say), x 5 (say) x 500 (say) = 690 MILLION rows.

Obviously your query does not return so many rows, so they all got filtered in the last clause. However, you last clause in the WHERE clause is a subquery that refers to three more unrelated tables. Which means that you are forcing the database engine to loop through 690 MILLION rows, execute one subquery on EACH (and it is a COUNT query, the slowest kind), and then throw away most of the ones with COUNT <> 0.

Now do you understand why it is slow?

The first thing I can think of to optimize is to replace the 0=(SELECT ... subquery with NOT EXISTS (SELECT... which stops the subquery execution onces a row is found, instead of forcing the database engine to count all of them, and then checking whether the count is zero. You have no use of the count. You are checking for existence here. Use EXISTS.

Secondly, make the second query a part of the FROM clause (perhaps a WITH clause or just put in there) and LEFT JOIN it together with the columndata table (on the left side), then select (WHERE) on NULL (right side). This prunes the columndata table (the largest one) by only keep rows that do not exist in the subquery. A database's query optimizer should attempt to rewrite a subquery with a JOIN, but I am not sure how good MySQL's optimizer is.


Thank you for your responses. First I removed the recursive call and made it first in my PHP code to find the few column_numbers (integers) that are not to be shown. I then do a "NOT IN" check of the result array. See call below. This call gives this explain result.

id select_type table type possible_keys key key_len ref rows filtered
1 SIMPLE b range grid_id grid_id 8 NULL 6 1000.00 Using where; Extra Using temporary; Using filesort 1 SIMPLE d ref PRIMARY document_id 4 const 4000 100.00
1 SIMPLE c eq_ref PRIMARY PRIMARY 4 b.column_id 1 100.00
1 SIMPLE a ref row_number column_id 8 c.id,d.id 1 100.00 Using where

SELECT DISTINCT a.id, a.string_data, b.grid_id, c.data_type,
d.document_id FROM columndata a, columngrid b, dcolumns c, drows d
WHERE b.grid_id = 9 AND d.document_id = 17 AND d.id = a.row_number
AND b.column_id = a.column_id AND c.id = a.column_id
AND a.column_id NOT IN (0,3,6) ORDER BY d.ordering, b.ordering LIMIT 0, 330

I also tried to make a join but this was much slower. Comments on how to improve with proper join syntax would be welcomed

First select rows (4000), then match columns to rows (50000) while removing columns that are hidden, then add extra data from columns and columgrid tables
SELECT t1.row_id t2.string_data, 2 AS grid_id, t5.data_type,
t1.document_id (SELECT DISTINCT a.id as row_id, a.string_data, a.ordering as row_ordering FROM drows a WHERE a.document_id = 7) t1 LEFT JOIN (SELECT DISTINCT b.row_id b.column_id b.string_data FROM dcolumns
WHERE b.column_id NOT IN (0,3,6)) t2 ON t1.id = t2.row_id LEFT JOIN (SELECT DISTINCT c.id, c.data_type) t3 ON t2.column_id = t3.id LEFT JOIN (SELECT DISTINCT d.column_id, d.row_id, d.ordering as column_ordering FROM columngrid) t4 ON(t1.id = t4.row_id AND t2.column_id = t4.column_id) ORDER BY t1.row_ordering, t4.column_ordering

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜