MySQL count(*) , Group BY and INNER JOIN
I have a really bad time with a query on MySQL 5.1. I simplified the 2 tables I make a JOIN on :
CREATE TABLE `jobs` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`title` VARCHAR( 255 ) NOT NULL
) ENGINE = MYISAM ;
AND
CREATE TABLE `jobsCategories` (
`jobID` int(11) NOT NULL,
`industryID` int(11) NOT NULL,
KEY `jobID` (`jobID`),
KEY `industryID` (`industryID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
The query is straight forward :
SELECT count(*) as nb,industryID
FROM jobs J
INNER JOIN jobsCategories C ON C.jobID=J.id
GROUP BY industryID
ORDER BY nb DESC;
I got around 150000 records into the jobs table, and 350000 records into the jobsCategories table, and I have 30 industries;
The query takes approximatively 50 seconds to execute !!!
Do you have any idea why it takes so long? How could I optimize the structure of this database? Profilling the query show me that 99% of the execution time is spend on copying on tmp tables.
EXPLAIN <query> gives me :
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: J
type: index
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 178950
Extra: Using index; Using temporary; Using file开发者_开发技巧sort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: C
type: ref
possible_keys: jobID
key: jobID
key_len: 8
ref: J.id
rows: 1
Extra: Using where
2 rows in set (0.00 sec)
About the memory :
free -m :
total used free shared buffers cached
Mem: 2011 1516 494 0 8 1075
-/+ buffers/cache: 433 1578
Swap: 5898 126 5772
With the FORCE INDEX suggested below
select count(*) as nb, industryID
from
jobs J
inner join jobsCategories C force index (industryID) on (C.jobID = J.id )
group by industryID
order by nb DESC;
SHOW PROFILE;
gives me :
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000095 |
| Opening tables | 0.000014 |
| System lock | 0.000008 |
| Table lock | 0.000007 |
| init | 0.000032 |
| optimizing | 0.000011 |
| statistics | 0.000032 |
| preparing | 0.000016 |
| Creating tmp table | 0.000031 |
| executing | 0.000003 |
| Copying to tmp table | 3.301305 |
| Sorting result | 0.000028 |
| Sending data | 0.000024 |
| end | 0.000003 |
| removing tmp table | 0.000009 |
| end | 0.000004 |
| query end | 0.000003 |
| freeing items | 0.000029 |
| logging slow query | 0.000003 |
| cleaning up | 0.000003 |
+----------------------+----------+
I guess my RAM (2Gb) is not large enough. How can I be certain this is the case?
Firstly I think that you don't need to join table jobs in order to get the same result (unless you have some garbage data in table jobsCategories):
select count(*) as nb, industryID
from jobsCategories
group by industryID
order by nb DESC;
Otherwise you may try to force index on industryID:
select count(*) as nb, industryID
from
jobs J
inner join jobsCategories C force index (industryID) on (C.jobID = J.id )
group by industryID
order by nb DESC;
change your tables to InnoDB =) InnoDB is good managing big tables and the COUNT(*) to make it faster
http://www.mysqlperformanceblog.com/2009/01/12/should-you-move-from-myisam-to-innodb/
Good Luck
EDIT:
after testing, it seems that MyISAM is faster than InnoDB when using COUNT(*)
when there is no WHERE
clause:
http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/
anyway, i've tested your exact query simulating the tables that you have (150k Jobs and 300k JobsCategories) using MyISAM tables and it took 1.5 seconds so maybe your problem is elsewhere.. it's all i can tell you =P
Hope I'm not misinterpreting the reading, but from what I see, you don't need ANY join. Since your grouping is how many jobs fall under each respective industry, its all in your job categories table, why join to the actual job table for the title of the job since that is not even being returned
select IndustryID,
count(*) JobsPerIndustry
from JobCategories
group by IndustryID
EDIT PER COMMENT / FEEDBACK...
That definitely makes a difference... adding a criteria associated with a job... Ensure your Jobs table has an index on the element you are expecting to allow limiting based on... Then follow similar query like you originally had. Ensure your Jobs table has an index on CountryID.
SELECT
count(*) as nb,
industryID
FROM jobs J
JOIN jobsCategories C
ON J.ID = C.jobID
WHERE
J.countryID=1234
GROUP BY
industryID
ORDER BY
nb DESC;
精彩评论