MySQL join query performance issue

2023-02-16 00:33 问答作者：

I am running the be query

SELECT packages.id, packages.title, subcat.id, packag开发者_运维知识库es.weight
FROM packages ,provider, packagestosubcat, 
     packagestocity, subcat, usertosubcat, 
     usertocity, usertoprovider 
WHERE packages.endDate >'2011-03-11 06:00:00' AND 
      usertosubcat.userid = 1 AND 
      usertocity.userid = 1 AND 
      packages.providerid = provider.id AND 
      packages.id = packagestosubcat.packageid AND 
      packages.id = packagestocity.packageid AND 
      packagestosubcat.subcatid = subcat.id AND 
      usertosubcat.subcatid = packagestosubcat.subcatid AND 
      usertocity.cityid = packagestocity.cityid AND 
      (
          provider.providertype = 'reg' OR 
          (
              usertoprovider.userid = 1 AND 
              provider.providertype != 'reg' AND 
              usertoprovider.providerid = provider.ID
          )
      ) 
GROUP BY packages.title 
ORDER BY subcat.id, packages.weight DESC

When i run explain, everything seems to look ok except for the scan on the usertoprovider table, which doesn't seem to be using table's keys:

id select_type table            type    possible_keys         key       key_len ref                       rows Extra
1  SIMPLE      usertocity       ref     user,city             user      4       const                     4    Using temporary; Using filesort
1  SIMPLE      packagestocity   ref     city,packageid        city      4       usertocity.cityid         419  
1  SIMPLE      packages         eq_ref  PRIMARY,enddate       PRIMARY   4       packagestocity.packageid  1    Using where
1  SIMPLE      provider         eq_ref  PRIMARY,providertype  PRIMARY   4       packages.providerid       1    Using where
1  SIMPLE      packagestosubcat ref     subcatid,packageid    packageid 4       packages.id               1    Using where
1  SIMPLE      subcat           eq_ref  PRIMARY               PRIMARY   4       packagestosubcat.subcatid 1  
1  SIMPLE      usertosubcat     ref     userid,subcatid       subcatid  4       const                     12   Using where
1  SIMPLE      usertoprovider   ALL     userid,providerid     NULL      NULL    NULL                      3735 Using where

As you can see in the above query, the condition itself is:

provider.providertype = 'reg' OR 
(
    usertoprovider.userid = 1 AND 
    provider.providertype != 'reg' AND 
    usertoprovider.providerid = provider.ID
)

Both tables, provider and usertoprovider, are indexed. provider has indexes on providerid and providertype while usertoprovider has indexes on userid and providerid

The cardinality of the keys is: provider.id=47, provider.type=1, usertoprovider.userid=1245, usertoprovider.providerid=6

So its quite obvious that the indexes are not used.

Further more, to test it out, i went ahead and:

Duplicated the usertoprovider table
Inserted all the provider values that have providertype='reg' into the cloned table
Simplified the condition to (usertoprovider.userid = 1 AND usertoprovider.providerid = provider.ID)

The query execution time changed from 8.1317 sec. to 0.0387 sec.

Still, provider values that have providertype='reg' are valid for all the users and i would like to avoid inserting these values into the usertoprovider table for all the users since this data is redundant.

Can someone please explain why MySQL still runs a full scan and doesn't use the keys? What can be done to avoid it?

It seems that provider.providertype != 'reg' is redundant (always true) unless provider.providertype is nullable and you want the query to fail on NULL.

And shouldn't != be <> instead to be standard SQL, although MySQL may allow !=?

On cost of table scans

It is not necessarily that a full table scan is more expensive than walking an index, because walking an index still requires multiple page accesses. In many database engines, if your table is small enough to fit inside a few pages, and the number of rows are small enough, it will be cheaper to do a table scan. Database engines make this type of decision based on the data and index statistics of the table.

This case

However, in your case, it might also be because of the other leg in your OR clause: provider.providertype = 'reg'. If providertype is "reg", then this query joins in ALL the rows of usertoprovider (most likely not what you want) since it is a multi-table cross join.

The database engine is correct in determining that you'll likely need all the table rows in usertoprovider anyway (unless none of the providertype's is "reg", but the engine also may know!).

The query hides this fact because you are grouping on the (MASSIVE!) result set later on and just returning the package ID, so you won't see how many usertoprovider rows have been returned. But it will run very slowly. Get rid of the GROUP BY clause to find out how many rows you are actually forcing the database engine to work on!!!

The reason you see a massive speed improvement if you fill out the usertoprovider table is because then every row participates in a join, and there is no full cross join happening in the case of "reg". Before, if you have 1,000 rows in usertoprovider, every row with type="reg" expands the result set 1,000 times. Now, that row joins with only one row in usertoprovider, and the result set is not expanded.

If you really want to pass anything with providertype='reg', but not in your many-to-many mapping table, then the easiest way may be to use a sub-query:

Remove usertoprovider from your FROM clause
Do the following:

provider.providertype='reg' OR EXISTS (SELECT * FROM usertoprovider WHERE userid=1 AND providerid = provider.ID)

Another method is to use an OUTER JOIN on the usertoprovider -- any row with "reg" which is not in the table will come back with one row of NULL instead of expanding the result set.

Hmm, I know that MySQL does funny things with grouping. In any other RDBMS, your query won't even be executed. What does that even mean,

SELECT packages.id 
[...]
GROUP BY packages.title 
ORDER BY subcat.id, packages.weight DESC

You want to group by title. Then in standard SQL syntax, this means you can only select title and aggregate functions of the other columns. MySQL magically tries to execute (and probably guess) what you may have meant to execute. So what would you expect to be selected as packages.id ? The First matching package ID for every title? Or the last? And what would the ORDER BY clause mean with respect to the grouping? How can you order by columns that are not part of the result set (because only packages.title really is)?

There are two solutions, as far as I can see:

You're on the right track with your query, then remove the ORDER BY clause, because I don't think it will affect your result, but it may severely slow down your query.
You have a SQL problem, not a performance problem

继续阅读：indexing optimization sql

MySQL join query performance issue

On cost of table scans

This case

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

On cost of table scans

This case

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生 新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？