开发者

Poor MySQL Join Performance

I've been trying to perform a join on two tables in MySQL, and the query will run for a minute or two before I run out of memory without getting results. I'm far from a database expert, so I'm not sure if I'm writing my queries poorly, if I have some MySQL settings poorly configured, or if I really should be doing something else entirely with my query. FYI the database is located locally on my machine.

I have a large table (~2 million records) where one of the columns is an ID into a small table (~3000 records). In case this matters, the ID is not unique in the large table but is unique in the small table. I've tried various flavors of the following query, but nothing seems to be working:

SELECT big_table.*, 
       small_table.col 
  FROM big_table 
left outer join small_table on (big_table.small_id = small_table.id)

I'm doing a lot of analysis on the data that does require all 2 million rows, though not necessarily in a single query. Here are the results of my "show create table":

'big_table', 'CREATE TABLE 'big_table' (
  'BIG_ID_1', varchar(12) NOT NULL,
  'BIG_ID_2', int(100) NOT NULL,
  'SMALL_ID' varchar(8) DEFAULT NULL,
  'TYPICAL_OTHER_COLUMN' varchar(3) DEFAULT NULL,
  ...
  PRIMARY KEY ('BIG_ID_1', 'BIG_ID_2')
 ) ENGINE=MyISAM DEFAULT CHARSET=latin1'

'small_table', 'CREATE TABLE `small_table` (
  `id`, varchar(8) NOT NULL 开发者_C百科DEFAULT '''',
  `col`, varchar(1) DEFAULT NULL,
  ...
  PRIMARY KEY (`id`),
  KEY `inx_id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'

Here is the "explain" result for one of my candidate queries:

id  select_type  table        type    possible_keys   key      key_len  ref                         rows     extra
1   SIMPLE       big_table    ALL     NULL            NULL     NULL     NULL                        1962193       
1   SIMPLE       small_table  eq_ref  PRIMARY,inx_id  PRIMARY  10       db_name.big_table.SMALL_ID  1             


You are selecting about 2 million records in a single query. Depending on the amount of data in each row it could be hundreds of megabytes of data that you are requesting.

Things you might want to try:

  • If you don't need all columns then query for the columns you need instead of using SELECT table.*.
  • See if you can move some (or all) of the processing to the database instead of fetching the data and processing it in the client.
  • Avoid reading the entire result set into memory in one go.
  • Process the rows in batches of a few thousand at a time rather than fetching all of them at once.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜