MySQL indexing strategy
I am trying to use the following query on a table with ~200k records in it. There are all sorts of other fields that can be filtered by, but this is a base example.
SELECT b.isbn FROM books b
WHERE
b.price IS NOT NULL AND
b.deleted = '' AND
b.publication_date <= '2009-12-04' AND
(
b.subject1_id IN ('CAT1','CAT2','CAT3','CAT4','CAT5') OR
b.subject2_id IN ('CAT1','CAT2','CAT3','CAT4','CAT5') OR
b.subject3_id IN ('CAT1','CAT2','CAT3','CAT4','CAT5')
)
Currently, I have a separate index on all of these fields and this query takes ~4.5 seconds, which is way too long. EXPLAIN
lists NULL
under key.
I also tried creating one large index that included all of the fields in the above query, but EXPLAIN
shows that this multi-field index is not used.
How can I index these fields to speed up my queries?
EDIT: Here are my current indices (none of which seem to be used by the query):
- index(price)
- index(deleted)
- index(publication_date)
- index(subject1_id)
- index(subject2_id)
- index(subject3_id)
- index(price, deleted, publication_date, subject1_id, subject2_id, subject3_id)
EDIT2: Per ʞɔıu's answer - after normalizing the tables and using basically his query, it does speed it up some (time is ~3.5 seconds now), but not as 开发者_如何学Pythonmuch as I'm looking for. I indexed the new table as PRIMARY KEY(isbn, subject_id) and this index is being used for the join.
EDIT3: I added an additional index on the second table (subject_id, isbn), which helps. The addition of the other index that ʞɔıu mentions below helps a small bit, but only gets used when I use "FORCE INDEX" on the query. It's down to about 1.5 seconds now. Is there hope of getting it much lower?
You need to normalize your schema before indexing will be of any help in this case.
You can create another table that contains (subject, isbn), add indexes on book and subject, then join to that table, like:
select b.isbn from books b
inner join book_subject bs on bs.isbn=b.isbn
where
b.price is not null and b.deleted != 'DELETED'
AND b.publication_date <= '2009-12-04'
AND bs.subject in ('CAT1', 'CAT2'...)
Rule #1 (literally) in schema normalization is: "no repeating groups". Having the OR operation in the where clause across the 3 subject columns is going to prevent you from being able to take advantage of an index for that part of the query.
(updated to reflect that isbn is the primary key)
I'd think more about what your query actually means and that may help lead the way to your answer.
Let's desugar it to show you the issue.
SELECT b.isbn FROM books b
WHERE
b.price IS NOT NULL AND
b.deleted != 'DELETED' AND
b.publication_date <= '2009-12-04' AND
(
b.subject1_id = 'CAT1' OR
b.subject1_id = 'CAT2' OR
b.subject1_id = 'CAT3' OR
b.subject1_id = 'CAT4' OR
b.subject1_id = 'CAT5' OR
b.subject2_id = 'CAT1' OR
b.subject2_id = 'CAT2' OR
b.subject2_id = 'CAT3' OR
b.subject2_id = 'CAT4' OR
b.subject2_id = 'CAT5' OR
b.subject3_id = 'CAT1' OR
b.subject3_id = 'CAT2' OR
b.subject3_id = 'CAT3' OR
b.subject3_id = 'CAT4' OR
b.subject3_id = 'CAT5'
)
There obviously isn't any index it will use beyond ones for (price,deleted,publication_date,subject1) and so on for the other subjects.
What fields are in the index?
First of all: MySQL can use only one Index per table during a select. It tries to choose the best possible index, but sometimes the server cannot decide for several reasons. Having multiple indexes on only one field each would only help, if you had a lot of statements running with only one where clause at a time.
To optimize here: You need to create an index which does include the fields
price
deleted
publication_date
DO NOT include the categories, since you are using the OR clause.
ALTER TABLE `test`.`books` ADD INDEX `idxPriceDeletedPublication`(`price`, `deleted`, `publication_date`);
This should then give you the following EXPLAIN output:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: b
type: range
possible_keys: idxPriceDeletedPublication
key: idxPriceDeletedPublication
key_len: 5
ref: NULL
rows: 1
Extra: Using where
1 row in set (0.00 sec)
In relation to nick's post :
create another table that contains (subject, book_id), add indexes on book and subject:
wouldn't it be cleaner to have
select b.isbn from books b
where
#various table b where restrictions
AND b.isbn IN (
Select isbn
from book_subject bs
where bs.subject IN ('CAT1', 'CAT2' ...)
)
精彩评论