开发者

Optimize 5 table SQL query (stores => items => words)

Tables

stores (100,000 rows): id (pk), name, lat, lng, ...

store_items (9,000,000 rows): store_id (fk), item_id (fk)

items (200,000 rows): id(pk), name, ...

item_words (1,000,000 rows): item_id(fk), word_id(fk)

words (50,000 rows): id(pk), word VARCHAR(255)

Note: all ids are integers.

========

Indexes

CREATE UNIQUE INDEX storeitems_storeid_itemid_i ON store_items(store_id,item_id);

CREATE UNIQUE INDEX itemwords_wordid_itemid_i ON item_words(word_id,item_id);

CREATE UNIQUE INDEX words_word_i ON words(word);

Note: I prefer multi column indexes (storeitems_storeid_itemid_i and itemwords_wordid_itemid_i) because: http://www.mysqlperformanceblog.com/2008/08/22/multiple-column-index-vs-multiple-indexes/

QUERY

select s.name, s.lat, s.lng, i.name
from words w, item_words iw, items i, store_items si, stores s
where iw.word_id=w.id
and i.id=iw.item_id
and si.item_id=i.id
and s.id=si.store_id
and w.word='MILK';

Problem: elapsed time is 20-120 sec (depending on the word)!!!

explain $QUERY$
+----+-------------+-------+--------+-------------------------------------------------------+-----------------------------+---------+-----------------------------+------+-------------+
| id | select_type | table | type   | possible_keys                                         | key                         | key_len | ref                         | rows | Extra       |
+----+-------------+-------+--------+-------------------------------------------------------+-----------------------------+---------+-----------------------------+------+-------------+
|  1 | SIMPLE      | w     | const  | PRIMARY,words_word_i                                  | words_word_i                | 257     | const                       |    1 | Using index |
|  1 | SIMPLE      | iw    | ref    | itemwords_wordid_itemid_i,itemwords_itemid_fk         | itemwords_wordid_itemid_i   | 4       | const                       |    1 | Using index |
|  1 | SIMPLE      | i     | eq_ref | PRIMARY                                               | PRIMARY                     | 4       | iw.item_id                  |    1 |             |
|  1 | SIMPLE      | si    | ref    | storeitems_storeid_itemid_i,storeitems_itemid_fk      | storeitems_itemid_fk        | 4       | iw.item_id                  |   16 | Using index |
|  1 | SIMPLE      | s     | eq_ref | PRIMARY                                               | PRIMARY                     | 4       | si.store_id                 |    1 |             |

I want elapsed time to be less than 5 secs!!! Any ideas???

==============

What I tried

I tried to see when increase in the execution time happens by adding tables to the query.

1 table

select * from words where word='MILK';

Elapsed time: 0.4 sec

2 tables

select count(*)
from words w, item_words iw
where iw.word_id=w.id
and w.word='MILK';

Elapsed time: 0开发者_高级运维.5-2 sec (depending on word)

3 tables

select count(*)
from words w, item_words iw, items i
where iw.word_id=w.id
and i.id=iw.item_id
and w.word='MILK';

Elapsed time: 0.5-2 sec (depending on word)

4 tables

select count(*)
from words w, item_words iw, items i, store_items si
where iw.word_id=w.id
and i.id=iw.item_id
and si.item_id=i.id
and w.word='MILK';

Elapsed time: 20-120 sec (depending on word)

I guess the problem with the indexes or with the design of query/database. But there must be a way to make it work fast. Google does it somehow and their tables are much bigger!


a) You're actually writing queries to do FTS inside mysql -> use real FTS like lucene instead.

b) Clearly, adding the 9M row join is the performance issue

c) How about limiting that join (maybe it's being done in full with the current query plan) like this :

SELECT
    s.name, s.lat, s.lng, i.name
FROM
    (SELECT * FROM words WHERE word='MILK') w
INNER JOIN
    item_words iw
ON
    iw.word_id=w.id
INNER JOIN
    items i
ON
    i.id=iw.item_id
INNER JOIN
    store_items si
ON
    si.item_id=i.id
INNER JOIN
    stores s
ON
    s.id=si.store_id;

The logic behind this is that instead of joining full tables and then limiting the results, you start by limiting the tables on which you will join, this (if the join order happens to be the one I wrote) will greatly reduce your working set and inner queries running time.

d) Google does NOT use mysql for FTS


Consider de-normalising the structure - the first candidate is the 1 million record item_words table - bring the words directly into the table. Creating a list of unique words might be more easily achieved through a view (depends on how often you need this data compared to, for example, your need to extract a list of shops with products associated with a keyword). Secondly - create indexed views (not an option in MySQL, but certainly an option on other commercial databases).


You don't have an index that it can use to find the store_id if given the item_id. If the cardinality of store_id is low enough it might gain some benefit from storeitems_storeid_itemid_i, but since you have 100,000 stores this might not be so useful. You might try creating an index on store_items that lists the item_id first:

CREATE UNIQUE INDEX storeitems_item_store ON store_items(item_id, store_id);

Also, I'm not sure if putting join conditions in the where clause will affect performance adversely in the way you're seeing but you might try changing the query to something like this:

select s.name, s.lat, s.lng, i.name
from words w LEFT JOIN item_words iw ON w.id=iw.word_id
LEFT JOIN items i ON i.id=iw.item_id
LEFT JOIN store_items si ON si.item_id=i.id
LEFT JOIN stores s ON s.id=si.store_id
where w.word='MILK';


Without knowing the exact layout of your tables it's hard to give a good answer. But these types of multi table joins has a tendency to get really bogged down. Especially if one of the factors making up the expression of selection is a dynamic string.

You could try to return multiple resultsets of the tables in one go, from a stored procedure or something and then joining the data outside of SQL. This way I got the query time of a massive join down from 2 minutes to 4 seconds. Or do it using a temporary table and return the resultset from that when you are done.

Start with selecting from the words table since that's where you have the dynamic string. Then you can select from the other tables based on the data returned from that query.


Try this one.
Rewrite the query in such way

select s.name, s.lat, s.lng, i.name  
from words w LEFT JOIN item_words iw ON w.id=iw.word_id AND w.word='MILK'  
LEFT JOIN items i ON i.id=iw.item_id  
LEFT JOIN store_items si ON si.item_id=i.id  
LEFT JOIN stores s ON s.id=si.store_id  

And create index on (w.id, w.word)


Have you tried analyzing the tables ? this will help the optimiser select the best possible execution plan.

e.g:

ANALYZE TABLE words
ANALYZE TABLE item_words
ANALYZE TABLE items
ANALYZE TABLE store_items
ANALYZE TABLE stores

see: http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜