How many fields should be indexed and how should I create them?
I've got a table in a MySQL database that has the following fields:
ID | GENDER | BIRTHYEAR | POSTCODE
Users can search the table using any of the fields in any combination (i.e., SELECT * FROM table WHERE GENDER = 'M' AND POSTCODE IN (1000, 2000);
or SE开发者_JAVA技巧LECT * FROM table WHERE BIRTHYEAR = 1973;
)
From the MySQL docs, it uses left indexing. So if I create an index on all 4 columns it won't use the index if the ID field isn't used. Do I need to create an index for every possible combination of field (ID; ID/GENDER; ID/BIRTHYEAR; etc.) or will creating one index for all fields be sufficient?
If it makes any difference, there are upwards of 3 million records in this table.
In this situation I typically log search criteria, number of results returned and time taken to perform the search. Just because you're creating the flexibility to search by any field doesn't mean your users make use of this flexibility. I'd normally create indexes on sensible combinations and then once I've determined the usage patterns drop the lowly used indexes or create new unsuspected indexes.
I'm not sure if MySQL supports statistics or histograms for skewed data but the index on gender may or may not work. If MySQL supports statistics then this will indicate the selectivity of an index. In a general population an index on a field with a 50/50 split won't help. If you're sample data is computer programmers and the data is 95% males then a search for females would use the index.
Use EXPLAIN.
(I'd say, use Postgres, too, lol).
It seems recent versions of MySQL can use several indexes in the same query, they call this Index Merge. In this case 1 index per column will be enough.
Gender is a special case, since selectivity is 50% you don't need an index on it, it would be counterproductive.
Creating indexes on single fields is useful but it would be really useful if your data was of varchar type and each record had a different value, since birthyear and postcode are numbers they are already well indexed.
You can index birthyear because it should be different for many of the records (but up to 120 birthyears in total at max I guess).
Gender in my opinion doesn't need an index.
You can find out what field combinations are most likely to give different results and index those, like: birthyear - postcode, id - birthyear, id - postcode.
精彩评论