Why use INCLUDE in a SQL index
I recently encountered an index in a database I maintain that was of the form:
CREATE INDEX [IX_Foo] ON [Foo]
( Id ASC )
INCLUDE
( SubId )
In this particular case, the performance problem that I was encountering (a slow SELECT filtering on both Id and SubId) could be fixed by simply moving the SubId column into the index proper rath开发者_如何学JAVAer than as an included column.
This got me thinking however that I don't understand the reasoning behind included columns at all, when generally, they could simply be a part of the index itself. Even if I don't particularly care about the items being in the index itself is there any downside to having column in the index rather than simply being included.
After some research, I am aware that there are a number of restrictions on what can go into an indexed column (maximum width of the index, and some column types that can't be indexed like 'image'). In these cases I can see that you would be forced to include the column in the index page data.
The only thing I can think of is that if there are updates on SubId, the row will not need to be relocated if the column is included (though the value in the index would need to be changed). Is there something else that I'm missing?
I'm considering going through the other indexes in the database and shifting included columns in the index proper where possible. Would this be a mistake?
I'm primarily interested in MS SQL Server, but information on other DB engines is welcome also.
The answers so far are all correct and all - but they might not convey enough what you gain from a covering index.
In your case, you have a table Foo
and some fields, including an Id
(which I assume is the primary key), and a SubId
which is some additional ID of some kind.
You also have an index IX_Foo
which I assume had only Id
in it for now.
So now you need to find the SubId
for Id=4
.
SELECT Id, SubId
FROM Foo
WHERE Id=4
- SQL Server will look at the SELECT statement and determine it can use
IX_Foo
- it will then go search for the value
Id=4
in your indexIX_Foo
- when it finds it, it now needs the value of
SubId
, too - the non-clustered index
IX_Foo
will contain the clustering key value - using that clustering key value, SQL Server will do a "bookmark lookup" to locate the actual data page where your entire data row is located
- it will fetch that page and extract the value for
SubId
from it - it will return those values to satisfy your query
The main point here is: once SQL Server has found your Id=4
in the IX_Foo
index, it will then need to do another I/O operation, a bookmark lookup, to go fetch the whole data row, in order to be able to find the SubId
value.
If you have a covering index, e.g. IX_Foo
also includes SubId
, that extra I/O to do the bookmark lookup is eliminated. Once the value Id=4
is found in the IX_Foo
index, that index page in your non-clustered index will also include the value of SubId
- SQL Server can now return those two values you asked for in your SELECT query without having to do an extra (potentially expensive and thus slow) bookmark lookup just to go fetch another Id column.
That's the main benefit of covering indices - if you only need one or two extra columns, besides the index values you're doing the lookup on, by including those values into the index itself, you can save yourself a lot of bookmark lookups and thus speed things up significantly. You should however only include very few, and small bits of information - don't duplicate your entire data rows into all non-clustered indices! That's not the point.
UPDATE: the trade-off is this: if you have an index on (Id, SubId), all the pages in the index have both columns - the whole index tree through.
If you INCLUDE(SubId), the SubId fields are only present at the leaf level.
This means
- SQL Server can't search and compare on SubId (the values are not in the index tree)
- less space is used since the values are only on the leaf level
The reason to have an additional column in an index is so that when you do a query that only requires the columns used by the index you can fulfill the query from the index by itself. This way you save some time and resources going back to the table. When this happens, we say the index is a covering index for the query.
The reason you might not want to make this additional column part of the "index proper" is because when you do inserts or updates on that column you're more likely to need to re-sort parts of the index.
Using include in an index allows the index to be used as a covering index (i.e. certain queries can be satisfied using that index alone, without having to perform a bookmark lookup into the clustered index), without adding those columns to the actual tree part of the index, thus keeping the size of the index down. (The included columns are only added to the leaf nodes of the index).
精彩评论