What is the minimum number of rows required to create an index?
I have created script to find selectivity of each column for all tables. In some tables with less than 100 rows, selectivity of a column 开发者_Go百科is more than 50%. Where Selectivity = Distinct Values / Total Number Rows. Are those columns eligible for an index? Or, can you tell me the minimum requirement for number of rows to create an index?
You can index on any column - the question is whether it makes any sense and whether that index will be used....
Typically, a selectivity of less than 1-5% might work - the smaller that percentage, the better. The best is single values out of a large population, e.g. a single customer ID out of hundreds of thousands - those indices will definitely be used.
Things like gender (only 2 values) or other things that only have a very limited number of possible values typically don't work well on an index. At least on their own - these columns might be ok to be included into another index as a second or third column.
But really, the only way to find out whether or not an index makes sense is to
- measure your queries before
- create the index
- run your queries again, check their execution plans, measure their timings
There's no golden rule as to when an index will be used (or ignored) - too many variables play into that decision.
For some expert advice on how to deal with indices, and how to find out which indices might not get used, and when it makes sense to create an index, see Kimberly Tripp's blog posts:
- Spring cleaning your indices (part 1)
- Spring cleaning your indices (part 2)
- Why aren't those non-clustered indices being used?
Most DBMS use a cache for data and code (stored procedure, execution plan, etc.). In SQL Server I think it's called the data and procedure cache, and in Oracle, it's called the buffer cache and the SGA. Table data and/or index can be in the cache.
Small table which are frequently accessed will most likely fit in the cache. But the table can be evicted from the cache, say, if a query load fresh data from the disk. There are options to indicate that you want a table to be permanently in the cache (See PINTABLE). That's maybe a better strategy that using an index if your table is very small (which is your case). Adding an index (which would also always be in the cache) could help further, but I don't know what would be the gain.
The big different in performance is disk access vs. memory access. Purpose of index is to reduce the amount of data to read from the disk, but if it's already in memory, gain is probably small.
I'm not sure about sql-server, but most DBMS don't use an index for retrieval if it can retrieve all of the table rows in a single I/O. You will see this on PLAN explanations, some tables are always tablespace scanned.
IMHO, any table with less than 5000 rows is not worth analysing for cardinality if the DBMS is running on a server.
精彩评论