Why to add clustered index keys to all (intermediate) nodes on NCI?

2023-01-22 11:26 问答作者：

Considering clustered table,

Quassnoi wrote (the last phrase in answer):

If the secondary index is declar开发者_开发技巧ed UNIQUE, clustered key is appended only to the leaf-level records of the secondary index.

This sounds like clustered key is added to (all) indermediate nodes of non-unique non-clustered index. And by the same logic RIDs are added to indermediate nodes in case of non-clustered table (?)

What is the purpose of it?

Update:

Currently this question has 9 votes: -5, +4, started from mere anonymous -3), the correct answer contradicts to most msdn docs.

Its value is not in fact itself but in how to solve this kind of issues concerning of SQL Server internals either contradictory or incorrectly or insufficiently described in docs.

Update2: @Quassnoi,

thanks for your answer enriching my abilities to investigate myself without asking silly questions.

DBCC IND() does not output PageID. I undestood that its PagePID instead (from output of DBCC IND) corresponds to PageID in output of DBCC DBCC Page().

I have more questions on their use (and study/investigation of internals), or other alternatives. I am not sure why this type of questions is considered as spam here.

Can you advise me proper forums/board for this type of questions (on SQL Server internals)?

This sounds like clustered key is added to (all) indermediate nodes of non-unique non-clustered index. And by the same logic RIDs are added to indermediate nodes in case of non-clustered table (?)

Yes, this is true.

This is done to improve the maintainability of an index.

Say, you have a secondary (non-clustered) index on column, 1,000,000 records with column = 1 and want to delete one of these records.

The record needs to be deleted from the index as well.

To locate the record to be deleted, a B-Tree search should be performed on the index. But if the branch nodes were not storing the value of the row pointer (be it a clustered key or a RID), the engine would have to scan all 1M records to determine the one to delete.

If the secondary key were UNIQUE, the value of the column would be enough to uniquely locate the node in the index, so storing the row pointer in the branch nodes is not required (and that's why they are not stored).

This discussion may be also interesting to you:

http://www.sqlservercentral.com/Forums/Topic714684-1545-6.aspx

Update:

To check the contents of the branch nodes, you can use DBCC IND:

CREATE TABLE t_clustered (id INT NOT NULL PRIMARY KEY, nval INT, uval INT)
CREATE TABLE t_nonclustered (id INT NOT NULL PRIMARY KEY NONCLUSTERED, nval INT, uval INT)

CREATE INDEX ix_clustered_nval ON t_clustered (nval)
CREATE UNIQUE INDEX ux_clustered_uval ON t_clustered (uval)
CREATE INDEX ix_nonclustered_nval ON t_nonclustered (nval)
CREATE UNIQUE INDEX ux_nonclustered_nval ON t_nonclustered (uval)
;
WITH    q(id) AS
        (
        SELECT  1
        UNION ALL
        SELECT  id + 1
        FROM    q
        WHERE   id < 10000
        )
INSERT
INTO    t_clustered
SELECT  id, (id - 1) / 10 + 1, id
FROM    q
OPTION  (MAXRECURSION 0)
;
WITH    q(id) AS
        (
        SELECT  1
        UNION ALL
        SELECT  id + 1
        FROM    q
        WHERE   id < 10000
        )
INSERT
INTO    t_nonclustered
SELECT  id, (id - 1) / 10 + 1, id
FROM    q
OPTION  (MAXRECURSION 0)

-- Replace mydb with your database name

DBCC IND (mydb, t_clustered, -1)
DBCC IND (mydb, t_nonclustered, -1)

In the output of these commands you should search for records with PageType = 2 (index page) and IndexLevel > 0 (non-leaf node) and find their PageID.

In my case, I got the following PageID: 21074, 21076, 21105, 21107. Note they are site specific: you will have the other values.

Then you should used DBCC PAGE to examine the contents of these pages:

DBCC PAGE (mydb, 1, 21074, 3)
DBCC PAGE (mydb, 1, 21076, 3)
DBCC PAGE (mydb, 1, 21105, 3)
DBCC PAGE (mydb, 1, 21107, 3)

FileId PageId      Row    Level  ChildFileId ChildPageId nval (key)  id (key)    KeyHashValue
FileId PageId      Row    Level  ChildFileId ChildPageId uval (key)  KeyHashValue
FileId PageId      Row    Level  ChildFileId ChildPageId nval (key)  HEAP RID (key)     KeyHashValue
FileId PageId      Row    Level  ChildFileId ChildPageId uval (key)  KeyHashValue

We see that the non-leaf nodes of the non-unique secondary index on nval contain record pointers (id (PRIMARY KEY CLUSTERED) and RID, appropriately), while those of the unique index on uval do not contain record pointers, only the values on the indexed column itself.

This is, again, because with a unique index the value of the column indexed is sufficient to locate its node in the index, while with a non-unique index it's not.

You are asking questions re what others have said without any understanding of the subject matter (IT; B-Trees; Index structures) re what they said, made statements about. This is an answer service, not a tutorial service.

"This sounds like clustered key is added to (all) indermediate nodes of non-unique non-clustered index"

No. Quassnoi said nothing of the sort. You cannot take statements (answers within a context; the question) and evaluate them in isolation. The CI key is only applicable to the leaf level, not the "intermediate nodes".

"And by the same logic RIDs are added to indermediate nodes in case of non-clustered table (?)"

Logic ? No again. The determination that the elephants tail is made of thick, long hairs, does not imply that the trunk is made of hair also.

Ask another question re the non-leaf nodes of a non-unique, non-clustered index. I am getting a bit non-plussed about the non-issue.

Answer. For your now consistently evidenced level of understanding, the nonclustered index has the full clustered key value as the entry at the leaf level. Period. End of story. That is no big deal because (a) the number of steps are the same (b) the CI index (not leaf) will be in cache anyway, and thus very fast, without requiring disk access until the last (leaf level).

NCI key lookup, No CI: Index lookup -> RID -> Data row lookup -> Data Row

NCI key lookup, with CI: Index lookup -> CI key -> Clustered index lookup -> Data Row

What is the purpose of it?

Performance. All vendors understand that the slowest component in the chain of functions activated by a query, is the disk, the only component with moving parts. They all do their best to avoid disk access, and improve performance. The Index itself is the most basic structure for avoiding disk access, since the 1960's. The basic B-Tree has not changed since then, it has merely had a million tiny advancements.

Now the problem is, while that is true, each vendor has (a) has their own little special techniques that enhance (add to, without changing the basic operation as described in my posts to you) the operation and (b) in the MicroShifty world, it changes all the time, because the enhancements are, well, not really enhancements. Point being, that excruciatingly low level is not relevant to understanding how indices work; or whether a CI or NCI is good for your particular use; or the advantages/disadvantages of each.

I have already identified, in order to assist you, do not get involved in the lower levels, until you understand the basics, higher levels ... if you do, you will get lost, and it will be an obstacle to your presented intention of learning. As evidenced here. Again.

继续阅读：database database-design indexing performance sql-server

Why to add clustered index keys to all (intermediate) nodes on NCI?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？