SQL Indexes and performance improvement
I have some questions about SQL indexes and how they improve performance. Hope you guys can answer them! :D
- What's the difference on creating an开发者_如何学编程 index of my entire table and an index of my table and a few columns? Are those indexes, with just a few columns specified, faster?
You didn't specify what database system you're using - those kinds of things are always very vendor specific.
Here's my know-how that applies to SQL Server:
first of all, an index in SQL Server can only have at most 900 bytes in its index entry. That alone makes it impossible to have an index with all columns.
Most of all: such an index makes no sense at all. What are you trying to achieve??
Consider this: if you have an index on (LastName, FirstName, Street, City)
, that index will not be able to be used to speed up queries on
FirstName
aloneCity
Street
That index would be useful for searches on
(LastName)
, or(LastName, FirstName)
, or(LastName, FirstName, Street)
, or(LastName, FirstName, Street, City)
but really nothing else - certainly not if you search for just Street
or just City
!
The order of the columns in your index makes quite a difference, and the query optimizer can't just use any column somewhere in the middle of an index for lookups.
Consider your phone book: it's order probably by LastName, FirstName, maybe Street. So does that indexing help you find all "Joe's" in your city? All people living on "Main Street" ?? No - you can lookup by LastName first - then you get more specific inside that set of data. Just having an index over everything doesn't help speed up searching for all columns at all.
If you want to be able to search by Street
- you need to add a separate index on (Street)
(and possibly another column or two that make sense).
If you want to be able to search by Occupation
or whatever else - you need another specific index for that.
Just because your column exists in an index doesn't mean that'll speed up all searches for that column!
The main rule is: use as few indices as possible - too many indices can be even worse for a system than having no indices at all.... build your system, monitor its performance, and find those queries that cost the most - then optimize these, e.g. by adding indices.
Don't just blindly index every column just because you can - this is a guarantee for lousy system performance - any index also requires maintenance and upkeep, so the more indices you have, the more your INSERT, UPDATE and DELETE operations will suffer (get slower) since all those indices need to be updated.
There are entire books written about the subject, but in simplest terms I can think of at the moment, indexes return results faster by the mere fact that they are sorted in the order you want them to be (so records can be found faster) and they are smaller, so fewer blocks have to be scanned to read the index records you want.
Simplistic I know, but it might be a good start in thinking about indexes.
In response to your specific question, generally you would not create an index of the entire table. When you only have a few columns in the index, you get improvement in two ways. A) The records can be found faster, because they are in a predefined order and good search routines can be used, and B) you might not even have to go to the actual table to return your data -- it might already be in the index itself.
You need to understand the data in your table and how it will be accessed. For example, if your table has some date column, "mydate", and your system will be querying your table based on date, e.g. "SELECT * FROM my_table WHERE mydate BETWEEN '2002-01-31' and '2003-05-14'", then the database can efficiently obtain the query rows based on the "mydate" column index.
If there is no index, the database needs to scan the whole table to find all rows that satisfy the query. If you table has 100 rows, a full table scan isn't such a big deal. If you have 2.5 million rows, a full table scan is a big deal.
精彩评论