Why to add clustered index keys to all (intermediate) nodes on NCI?
Considering clustered table,
Quassnoi wrote (the last phrase in answer):- If the secondary index is declar开发者_开发技巧ed UNIQUE, clustered key is appended only to the leaf-level records of the secondary index.
This sounds like clustered key is added to (all) indermediate nodes of non-unique non-clustered index. And by the same logic RIDs are added to indermediate nodes in case of non-clustered table (?)
What is the purpose of it?
Update:
Currently this question has 9 votes: -5, +4, started from mere anonymous -3), the correct answer contradicts to most msdn docs. Its value is not in fact itself but in how to solve this kind of issues concerning of SQL Server internals either contradictory or incorrectly or insufficiently described in docs.Update2: @Quassnoi,
thanks for your answer enriching my abilities to investigate myself without asking silly questions.DBCC IND() does not output PageID. I undestood that its PagePID instead (from output of DBCC IND) corresponds to PageID in output of DBCC DBCC Page().
I have more questions on their use (and study/investigation of internals), or other alternatives. I am not sure why this type of questions is considered as spam here. Can you advise me proper forums/board for this type of questions (on SQL Server internals)?This sounds like clustered key is added to (all) indermediate nodes of non-unique non-clustered index. And by the same logic RIDs are added to indermediate nodes in case of non-clustered table (?)
Yes, this is true.
This is done to improve the maintainability of an index.
Say, you have a secondary (non-clustered) index on column
, 1,000,000
records with column = 1
and want to delete one of these records.
The record needs to be deleted from the index as well.
To locate the record to be deleted, a B-Tree
search should be performed on the index. But if the branch nodes were not storing the value of the row pointer (be it a clustered key or a RID
), the engine would have to scan all 1M
records to determine the one to delete.
If the secondary key were UNIQUE
, the value of the column
would be enough to uniquely locate the node in the index, so storing the row pointer in the branch nodes is not required (and that's why they are not stored).
This discussion may be also interesting to you:
http://www.sqlservercentral.com/Forums/Topic714684-1545-6.aspx
Update:
To check the contents of the branch nodes, you can use DBCC IND
:
CREATE TABLE t_clustered (id INT NOT NULL PRIMARY KEY, nval INT, uval INT)
CREATE TABLE t_nonclustered (id INT NOT NULL PRIMARY KEY NONCLUSTERED, nval INT, uval INT)
CREATE INDEX ix_clustered_nval ON t_clustered (nval)
CREATE UNIQUE INDEX ux_clustered_uval ON t_clustered (uval)
CREATE INDEX ix_nonclustered_nval ON t_nonclustered (nval)
CREATE UNIQUE INDEX ux_nonclustered_nval ON t_nonclustered (uval)
;
WITH q(id) AS
(
SELECT 1
UNION ALL
SELECT id + 1
FROM q
WHERE id < 10000
)
INSERT
INTO t_clustered
SELECT id, (id - 1) / 10 + 1, id
FROM q
OPTION (MAXRECURSION 0)
;
WITH q(id) AS
(
SELECT 1
UNION ALL
SELECT id + 1
FROM q
WHERE id < 10000
)
INSERT
INTO t_nonclustered
SELECT id, (id - 1) / 10 + 1, id
FROM q
OPTION (MAXRECURSION 0)
-- Replace mydb with your database name
DBCC IND (mydb, t_clustered, -1)
DBCC IND (mydb, t_nonclustered, -1)
In the output of these commands you should search for records with PageType = 2
(index page) and IndexLevel > 0
(non-leaf node) and find their PageID
.
In my case, I got the following PageID
: 21074, 21076, 21105, 21107. Note they are site specific: you will have the other values.
Then you should used DBCC PAGE
to examine the contents of these pages:
DBCC PAGE (mydb, 1, 21074, 3)
DBCC PAGE (mydb, 1, 21076, 3)
DBCC PAGE (mydb, 1, 21105, 3)
DBCC PAGE (mydb, 1, 21107, 3)
FileId PageId Row Level ChildFileId ChildPageId nval (key) id (key) KeyHashValue
FileId PageId Row Level ChildFileId ChildPageId uval (key) KeyHashValue
FileId PageId Row Level ChildFileId ChildPageId nval (key) HEAP RID (key) KeyHashValue
FileId PageId Row Level ChildFileId ChildPageId uval (key) KeyHashValue
We see that the non-leaf nodes of the non-unique secondary index on nval
contain record pointers (id (PRIMARY KEY CLUSTERED)
and RID
, appropriately), while those of the unique index on uval
do not contain record pointers, only the values on the indexed column itself.
This is, again, because with a unique index the value of the column indexed is sufficient to locate its node in the index, while with a non-unique index it's not.
You are asking questions re what others have said without any understanding of the subject matter (IT; B-Trees; Index structures) re what they said, made statements about. This is an answer service, not a tutorial service.
"This sounds like clustered key is added to (all) indermediate nodes of non-unique non-clustered index"
No. Quassnoi said nothing of the sort. You cannot take statements (answers within a context; the question) and evaluate them in isolation. The CI key is only applicable to the leaf level, not the "intermediate nodes".
"And by the same logic RIDs are added to indermediate nodes in case of non-clustered table (?)"
Logic ? No again. The determination that the elephants tail is made of thick, long hairs, does not imply that the trunk is made of hair also.
Ask another question re the non-leaf nodes of a non-unique, non-clustered index. I am getting a bit non-plussed about the non-issue.
Answer. For your now consistently evidenced level of understanding, the nonclustered index has the full clustered key value as the entry at the leaf level. Period. End of story. That is no big deal because (a) the number of steps are the same (b) the CI index (not leaf) will be in cache anyway, and thus very fast, without requiring disk access until the last (leaf level).
NCI key lookup, No CI: Index lookup -> RID -> Data row lookup -> Data Row
NCI key lookup, with CI: Index lookup -> CI key -> Clustered index lookup -> Data Row
What is the purpose of it?
Performance. All vendors understand that the slowest component in the chain of functions activated by a query, is the disk, the only component with moving parts. They all do their best to avoid disk access, and improve performance. The Index itself is the most basic structure for avoiding disk access, since the 1960's. The basic B-Tree has not changed since then, it has merely had a million tiny advancements.
Now the problem is, while that is true, each vendor has (a) has their own little special techniques that enhance (add to, without changing the basic operation as described in my posts to you) the operation and (b) in the MicroShifty world, it changes all the time, because the enhancements are, well, not really enhancements. Point being, that excruciatingly low level is not relevant to understanding how indices work; or whether a CI or NCI is good for your particular use; or the advantages/disadvantages of each.
I have already identified, in order to assist you, do not get involved in the lower levels, until you understand the basics, higher levels ... if you do, you will get lost, and it will be an obstacle to your presented intention of learning. As evidenced here. Again.
精彩评论