Need basic advice regarding mysql indexes and query performance
I'm a relative newbie when it comes to database design. I know how to define the tables, but I'm starting to run into performance issues as my database grows. Can someone give me some guidance on how to use mysql indexes to improve query performance? My understa开发者_开发技巧nding of indexes is that they prevent the entire database table from being searched line by line, while still being able to return valid results.
I have typically just defined a primary key for each table that gives a unique id for each row. Is the idea behind defining an "index" that you do it for each field that will be used in the WHERE clause of a mysql statement? Not sure if that is too general of a statement.
As an example, lets say we have the following three tables:
products
products_id, products_name
categories
categories_id, categories_name
products_to_categories
products_id categories_id
and we want to run the following query:
SELECT p.products_name, c.categories_name
FROM categories as c
JOIN products as p
JOIN products_to_categories as p2c
WHERE p.products_id=12345
Would we define both fields in the products_to_categories table as indexes and then the products_id and categories_id as primary keys within their parent table?
Any advice or guidance on the general approach to indexes would be much appreciated!
Would we define both fields in the products_to_categories table as indexes and then the products_id and categories_id as primary keys within their parent table?
Yes, that's exactly what I would do.
Also, if any data is going to be unique, like product name for example, you can give a unique key to that column. This will let your database engine stop searching the entire table when it finds the unique item.
Here's your query optimized:
SELECT p.products_name, c.categories_name
FROM categories as c
LEFT JOIN products_to_categories as p2c ON p2c.categories_id = c.categories_id
INNER JOIN products as p ON p.products_id = p2c.products_id
WHERE p.products_id=12345
The best way to think about indexes is how you expect to query the data.
Let's assume that products_id and categories_id are PRIMARY KEY
in your database, which means they are indexed automatically. If not, start with that.
When I do multi-join tables, if you want to be paranoid, create two indexes to allow bi-directional accessibility of the IDs, e.g.
CREATE TABLE products_to_categories (
products_id integer unsigned NOT NULL,
categories_id integer unsigned NOT NULL,
INDEX p_to_c (products_id,categories_id),
INDEX c_to_p (categories_id,products_id)
) ENGINE=MyISAM;
This takes a lot of space, but it will be really, really fast, and unless you query both directions (from products to categories, and then reverse), it's probably overkill. Alternatively, by default, I do:
CREATE TABLE products_to_categories (
products_id integer unsigned NOT NULL,
categories_id integer unsigned NOT NULL,
INDEX p (products_id),
INDEX c (categories_id)
) ENGINE=MyISAM;
If you need some sort of constraint (many-to-one, one-to-many) then change your index types to UNIQUE etc.
In general, start with the latter definition, do your query, and run an EXPLAIN on it. If it shows anything which is more than 1 for the number of matched rows (except for the first table), then re-work the indexes.
Database indexing is really more a matter of testing and diagnostics than many think. I didn't know how to do this for a while, until I actually had a problem. In short:
- Create your indexes
- Determine your queries
- Run EXPLAIN on your queries, and run timing tests to determine query speed!
- Adjust your indexes
- Go back to 3
As one user commented below, EXPLAIN is a good starting point before running timing tests, but nothing beats actual timing tests in the wild.
Yes, define both fields in the products_to_categories table as indexes and then the products_id and categories_id as primary keys within their parent tables. And yes, the idea behind defining an index is that you do it for each field that will be used in the WHERE clause of a mysql statement.
Indexes are automatically generated for fields with a unique constraint, including primary keys. But indexes should exist on all fields used in a join in both tables. Generally, use the explain <SQL>
tool to find out exactly which indexes are being used. Watch out for full table scans of joined tables when you're looking for specific values.
Strange how most answers so far dont include a primary key on the products_to_categories table so it would be possible to insert duplicates e.g.
insert into products_to_categories (prod_id, cat_id) values (1,1),(1,2),(1,1); -- oops
You might want to check the following answer of mine which is optimised for selecting products by category without the need of secondary indexes.
Rewriting mysql select to reduce time and writing tmp to disk
精彩评论