How to optimize a slow "select distinct" query across three tables, 40k rows, that only returns 22 results

2023-03-08 17:52 问答作者：

So I have this query written by someone else that I'm trying to refactor, which pulls some features/materials for an item(shoes, generally).

There are a lot of products, and thus a whole lot of joining table-entries, but only a few few features that are available for them. I'm thinking that there has to be a way to cut down the need to touch upon the "big" list of items, to get these features, and I have heard that distinct is to be avoided, but I don't have a statement that can replace the "distinct" options here.

According to my logs, I'm getting slow result times:

Query_time: 7 Lock_time: 0 Rows_sent: 32 Rows_examined: 5362862

Query_time: 8 Lock_time: 0 Rows_sent: 22 Rows_examined: 6581994

As the message says, sometimes it is taking 7 or 8 seconds and sometimes or every time it is querying over 5 million rows.

That may be due to other load occurring at the same time, because here are the selects run on the database directly from the mysql command line:

mysql> SELECT DISTINCT features.FeatureId, features.Name
       FROM features, itemsfeatures, items
       WHERE items.FlagStatus != 'U'
         AND items.TypeId = '13'
         AND features.Type = 'Material'
         AND features.FeatureId = itemsfeatures.FeatureId
       ORDER BY features.Name;
+-----------+--------------------+
| FeatureId | Name               |
+-----------+--------------------+
|        40 | Alligator          |
|        41 | Burnished Calfskin |
|        42 | Calfskin           |
|        59 | Canvas             |
|        43 | Chromexcel         |
|        44 | Cordovan           |
|        57 | Cotton             |
|        45 | Crocodile          |
|        58 | Deerskin           |
|        61 | Eel                |
|        46 | Italian Leather    |
|        47 | Lizard             |
|        48 | Nappa              |
|        49 | NuBuck             |
|        50 | Ostrich            |
|        51 | Patent Leather     |
|        60 | Rubber             |
|        52 | Sharkskin          |
|        53 | Silk               |
|        54 | Suede              |
|        56 | Veal               |
|        55 | Woven              |
+-----------+--------------------+
22 rows in set (0.00 sec)

mysql> select count(*) from features;
+----------+
| count(*) |
+----------+
|      122 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from itemsfeatures;
+----------+
| count(*) |
+----------+
|    38569 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from items;
+----------+
| count(*) |
+----------+
|     8656 |
+----------+
1 row in set (0.00 sec)

explain SELECT DISTINCT features.FeatureId, features.Name  FROM features, itemsfeatures, items    WHERE items.FlagStatus != 'U'  AND items.TypeId = '13'  AND features.Type = 'Material' AND features.FeatureId = itemsfeatures.FeatureId  ORDER BY features.Name;
+----+-------------+---------------+------+-------------------+-----------+---------+---------------------------------+------+----------------------------------------------+
| id | select_type | table         | type | possible_keys     | key       | key_len |开发者_JAVA百科 ref                             | rows | Extra                                        |
+----+-------------+---------------+------+-------------------+-----------+---------+---------------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | features      | ref  | PRIMARY,Type      | Type      | 33      | const                           |   21 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | itemsfeatures | ref  | FeatureId         | FeatureId | 4       | sherman_live.features.FeatureId |  324 | Using index; Distinct                        |
|  1 | SIMPLE      | items         | ALL  | TypeId,FlagStatus | NULL      | NULL    | NULL                            | 8656 | Using where; Distinct; Using join buffer     |
+----+-------------+---------------+------+-------------------+-----------+---------+---------------------------------+------+----------------------------------------------+
3 rows in set (0.04 sec)

Edit:

Here is sample results without the distinct, (but with a limit, since otherwise it just hangs) for comparison:

SELECT features.FeatureId, features.Name        FROM features, itemsfeatures, items        WHERE items.FlagStatus != 'U'          AND items.TypeId = '13'          AND features.Type = 'Material'          AND features.FeatureId = itemsfeatures.FeatureId        ORDER BY features.Name limit 10;
+-----------+-----------+
| FeatureId | Name      |
+-----------+-----------+
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
+-----------+-----------+
10 rows in set (23.30 sec)

here's using a group by instead of a select distinct:

SELECT features.FeatureId, features.Name        FROM features, itemsfeatures, items        WHERE items.FlagStatus != 'U'          AND items.TypeId = '13'          AND features.Type = 'Material'          AND features.FeatureId = itemsfeatures.FeatureId        group by features.name ORDER BY features.Name;
+-----------+--------------------+
| FeatureId | Name               |
+-----------+--------------------+
|        40 | Alligator          |
|        41 | Burnished Calfskin |
|        42 | Calfskin           |
|        59 | Canvas             |
|        43 | Chromexcel         |
|        44 | Cordovan           |
|        57 | Cotton             |
|        45 | Crocodile          |
|        58 | Deerskin           |
|        61 | Eel                |
|        46 | Italian Leather    |
|        47 | Lizard             |
|        48 | Nappa              |
|        49 | NuBuck             |
|        50 | Ostrich            |
|        51 | Patent Leather     |
|        60 | Rubber             |
|        52 | Sharkskin          |
|        53 | Silk               |
|        54 | Suede              |
|        56 | Veal               |
|        55 | Woven              |
+-----------+--------------------+
22 rows in set (13.28 sec)

Edit: Added a bounty

...Because I'm trying to understand this general problem, how to replace bad select distinct queries in general, in addition to the slowness that this query specifically tends to cause.

I'm wondering whether the replacement for a select distinct is generally a group by (although in this case that isn't a comprehensive solution since it's still slow)?

Looks like you're missing a JOIN condition linking itemsfeatures to items. It's more obvious if you write the query using explicit JOIN operations.

SELECT DISTINCT f.FeatureId, f.Name  
    FROM features f
        INNER JOIN itemsfeatures ifx
            ON f.FeatureID = ifx.FeatureID
        INNER JOIN items i
            ON ifx.ItemID = i.ItemID /* This is the part you're missing */
    WHERE i.FlagStatus != 'U'  
        AND i.TypeId = '13'  
        AND f.Type = 'Material' 
    ORDER BY f.Name;

As Joe states, there does seem to be a missing join condition

This is your current query

SELECT DISTINCT 
        features.FeatureId, 
        features.Name
FROM    features, 
        itemsfeatures, 
        items
WHERE   items.FlagStatus != 'U'
        AND items.TypeId = '13'
        AND features.Type = 'Material'
        AND features.FeatureId = itemsfeatures.FeatureId
ORDER BY features.Name

This is your query with explicit joins

SELECT DISTINCT 
        features.FeatureId, 
        features.Name
FROM    features INNER JOIN
        itemsfeatures on features.FeatureId = itemsfeatures.FeatureId CROSS JOIN
        items
WHERE   items.FlagStatus != 'U'
        AND items.TypeId = '13'
        AND features.Type = 'Material'
ORDER BY features.Name

I can't be 100% sure but it looks like removing any reference to the items table should give you the exact same result

SELECT DISTINCT 
        features.FeatureId, 
        features.Name
FROM    features, 
        itemsfeatures
WHERE   features.Type = 'Material'
        AND features.FeatureId = itemsfeatures.FeatureId
ORDER BY features.Name

The way the query is written, it seems it wants a list of materials for items with a typeID of 13 and Flagstatus <> U. If that is the case the result being returned by the orignial query are wrong. It is simply returning all materials for all items.

So as Joe states add the inner join for items and use explicit joins as they make the meaning clearer. I prefer to use group by but distinct will do the same thing.

SELECT  features.FeatureId, 
        features.Name
FROM    features INNER JOIN
        itemsfeatures on features.FeatureId = itemsfeatures.FeatureId INNER JOIN
        items on itemsfeatures.ItemID = items.ItemID
WHERE   items.FlagStatus != 'U'
        AND items.TypeId = '13'
        AND features.Type = 'Material'
GROUP BY features.FeatureId, 
        features.Name
ORDER BY features.Name

With that now sorted, now comes the speed. Create the following three indexes.

FeaturesIndex(Type,FeatureID,Name)
ItemsFeaturesIndex(FeatureId)
ItemsIndex(TypeId,FlagStatus,ItemID)

This should speed up both your current query and the one I listed.

I am almost confident that Joe's answer is correct. But if you think that Joe is wrong and you want to get the same results as your original query, but faster, then use this query:

SELECT DISTINCT features.FeatureId, features.Name
    FROM features, itemsfeatures
    WHERE features.Type = 'Material'
        AND features.FeatureId = itemsfeatures.FeatureId
    ORDER BY features.Name;

继续阅读：distinct query-optimization sql

How to optimize a slow "select distinct" query across three tables, 40k rows, that only returns 22 results

Edit: Added a bounty

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

Edit: Added a bounty

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生 新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？