Speeding up this big JOIN

2023-01-26 18:20 问答作者：

EDIT: there was a mistake in the following question that explains the observations. I could delete the question but this might still be useful to someone. The mistake was that the actual query running on the server was SELECT * FROM t (which was silly) when I thought it was running SELECT t.* FROM t (which makes all the difference). See tobyobrian's answer and the comments to it.

I've a too slow query in a situation with a schema as follows. Table t has data rows indexed by t_id. t adjoins tables x and y via junction tables t_x and t_y each of which contains only the foreigns keys required for the JOINs:

CREATE 开发者_开发知识库TABLE t (
  t_id INT NOT NULL PRIMARY KEY,
  data columns...
);
CREATE TABLE t_x (
  t_id INT NOT NULL,
  x_id INT NOT NULL,
  PRIMARY KEY (t_id, x_id),
  KEY (x_id)
);
CREATE TABLE t_y (
  t_id INT NOT NULL,
  y_id INT NOT NULL,
  PRIMARY KEY (t_id, y_id),
  KEY (y_id)
);

I need to export the stray rows in t, i.e. those not referenced in either junction table.

SELECT t.* FROM t
LEFT JOIN t_x ON t_x.t_id=t.t_id
LEFT JOIN t_y ON t_y.t_id=t.t_id
WHERE t_x.t_id IS NULL OR t_y.t_id IS NULL
INTO OUTFILE ...;

t has 21 M rows while t_x and t_y both have about 25 M rows. So this is naturally going to be a slow query.

I'm using MyISAM so I thought I'd try to speed it up by preloading the t_x and t_y indexes. The combined size of t_x.MYI and t_y.MYI was about 1.2 M bytes so I created a dedicated key buffer for them, assigned their PRIMARY keys to the dedicated buffer and LOAD INDEX INTO CACHE'ed them.

But as I watch the query in operation, mysqld is using about 1% CPU, the average system IO pending queue length is around 5, and mysqld's average seek size is in the 250 k range. Moreover, nearly all the IO is mysqld reading from t_x.MYI and t_x.MYD.

I don't understand:

Why mysqld is reading the .MYD files at all?
Why mysqld isn't using the preloaded the t_x and t_y indexes?

Could it have something to do with the t_x and t_y PRIMARY keys being over two columns?

EDIT: The query explained:

| id | select_type | table | type | possible_keys | key     | key_len | ref       | rows     | Extra       |
+----+-------------+-------+------+---------------+---------+---------+-----------+----------+-------------+
|  1 | SIMPLE      | t     | ALL  | NULL          | NULL    | NULL    | NULL      | 20980052 |             | 
|  1 | SIMPLE      | t_x   | ref  | PRIMARY       | PRIMARY | 4       | db.t.t_id |   235849 | Using index | 
|  1 | SIMPLE      | t_y   | ref  | PRIMARY       | PRIMARY | 4       | db.t.t_id |   207947 | Using where | 
+----+-------------+-------+------+---------------+---------+---------+-----------+----------+-------------+

Use not exists - this will be the fastest - much better than 'joins' or using 'not in' in this sitution.

SELECT t.* FROM t a
Where not exists (select 1 from t_x b
                  where b.t_id = a.t_id)
or not exists (select 1 from t_y c
                where c.t_id = a.t_id);

I can answer part 1 of your question, and i may or may not be able to answer part two if you post the output of EXPLAIN:

In order to select t.* it needs to look in the MYD file - only the primary key is in the index, to fetch the data columns you requested it needs the rest of the columns.

That is, your query is quite probably filtering the results very quickly, its just struggling to copy all the data you wanted.

Also note that you will probably have duplicates in your output - if one row has no refs in t_x, but 3 in x_y you will have the same t.* repeated 3 times. Given we think the where clause is sufficiently efficient, and much time is spent on reading the actual data, this is quite possibly the source of your problems. try changing to select distinct and see if that helps your efficiency

This may be a bit more efficient:

SELECT * 
FROM t
WHERE t.id NOT IN (
  SELECT DISTINCT t_id
  FROM t_x
  UNION
  SELECT DISTINCT t_id
  FROM t_y
);

继续阅读：optimization query-optimization sql

Speeding up this big JOIN

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？