Is it bad to have a non-clustered index that contains the primary key from the clustered index?
If you have a table with a clustered index on th开发者_如何学Pythone Primary Key (int), is it redundant and bad to have one (ore more) non-clustered indexes that include that primary key column as one of the columns in the non-clustered index?
Actually there could be valid reasons to create a non-clustered index identical with the clustered one. The reason is that clustered indexes carry the baggage of the row data and this can make very poor row density. Ie. you can have 2-3 rows per page due to wide fields that are not in the clustered key, but the clustered index key is only, say, 20 bytes. Having a non-clustered index on exactly the same key(s) and order as the clustered index would give a density of 2-3 hundreds of keys per page. A lot of aggregate queries typical for an OLAP/BI workload can be answered more efficiently by the non-clustered index, simply because it reduces the I/O by hundreds of times.
As for non-clustered indexes that contain parts of the clustered key, or even the same keys but in different order, then all bets are off as they obviously could be used for a multitude of queries.
So the answer to your question is: It Depends.
For a more precise answer you'll have to share the exact schema of your table(s) and the exact queries involved.
Yes, it is typically not necessary, because the columns of the clustered index are already added to each index entry in the non-clustered index.
Why? The value of the clustered key is what really allow SQL Server to "find" a row of data - it's the "pointer" to the actual data - so obviuosly, it has to be stored in the non-clustered index. If you have looked up "Smith, John" and you need to know more about this person, you need to go to the actual data --> and that is done by including the value of the clustering key in the index node of the non-clustered index.
That clustered key value is already there, and thus typically it's redundant and unnecessary to add that value again, explicitly, to your non-clustered index. It's bad in that it just simply wastes space without giving you any benefit.
I'm with Remus on this - a clustered index is not really an index - it tells you how the data is organized in pages. (In your case, it's also the primary key, but that's not required to be the same thing). Non-clustered indexes include that row locator information, so yes, it is redundant.
But if a non-clustered index is covering and the data row bookmark doesn't need to be used, it can be used a lot more efficiently than the clustered index, and the efficiency increases as the ratio of the size of the data row to the size of the non-clustered index increases.
I've found that if you have a good handle on the access paths in your query workload, that sometimes a few selective covering non-clustered indexes often can be used to eliminate clustering choices completely - heap table, a PK, and some good non-clustered indexes, and you're done.
There's no 100% answer, but the answer is almost definitely.
The other indexes are there to assist in helping with joins and sorting (generally). Given that the primary key is already indexed, if the optimizer can join based on that it'll use that.
If another index is needed from a join/sort perspective, what additional help does having the PK in the index mix provide? If it couldn't join based on the PK before, it's not going to now. And it's not really going to help any with sorting either.
精彩评论