开发者

How is sequential traversing between intermediate leaves of clustered index used?

Intermediate leaves of cl开发者_如何学Goustered index are linked sequentially (next, previous) for faster access (between intermediate nodes) [1], [2], etc.:

How is sequential traversing between intermediate leaves of clustered index used?

How this access is used?

Why is it needed?

[1]

Clustered Index Structures

http://msdn.microsoft.com/en-us/library/ms177443.aspx

[2]

Clustered Tables vs Heap Tables

http://www.mssqltips.com/tip.asp?tip=1254

Update: Follow-up question for answerers:

  • What are the helpful cases of Internals Viewer for SQL Server to convince me to use it?


The clustered index (and not non-clustered indices) can be used for range queries. Do you know what that is ? Horizontal traversal of the B-Tree enhances the speed of navigating the CI when determining qualified rows during range queries.

In a more general sense, if the server cache is too small, and the CI pages get paged out, when any query (not only the range queries) need to get the next page when walking down, or sideways, through a CI, it can get the page with a single disk access, because the pages are linked by a pointer; ie. it avoids walking back up one level to find the next page). Just one of the many reasons CIs are much faster than NCIs; they are far more enhanced because the NCI depends on them (your other question today).

The diagram has mistakes (contains false info), or to put it more precisely, it is a descriptive, non-technical diagram, from a non-technical corporation:

  1. The intermediate levels have a single pointer to the page at the next level (not multiple pointers).

  2. The leaf level IS the data row. There are no pointers to rows (at the intermediate OR leaf level).

  3. The Index Pages do not resemble a page of text and images. Each Index Page contains hundreds of index B-Tree entries.

  4. The Root page is different only in that the first entry is the single root to the index ; it contains hundreds of entries which are of course second level, and may be third level, etc.

There is a reason technicians draw, and use, technical drawings: among others, it avoids misunderstandings and confusion. No questions re the Diagram I Made for You ?

Response to Martin Smith's Post

a. Me: The clustered index (and not non-clustered indices) can be used for range queries

MS: Incorrect: Non-clustered indices can be used perfectly well for range queries as long as the Non Clustered Index is covering.

It appears you understand a Covered Query, but you do not understand a Range Query. Please read up on it. It is unfortunately named "query", but actually it is a performance technique that all the SQL vendors provide. Say you have a real Relational table, which means a composite key, eg. Invoice PK is (CustomerId, InvoiceNo) [not InvoiceId]. Then a query such as:

SELECT * FROM Invoice WHERE CustomerId = @CustomerId

will navigate the B-Tree of the ClusteredIndex once, to find the fist Invoice for the CustomerId. It will then follow the PageChain of the LeafLevel (data rows) to obtain the second and subsequent Invoice for the CustomerId. There is no further use of the B-Tree for the query. The Range Query ends when the first Invoice with CustomerId > 1 is encountered.

That is only possible with a ClusteredIndex, where the B-Tree is married to the Data, in a single physical structure.

That is physically impossible with a NonClusteredIndex-plus-Data (which is a Heap or a ClusteredIndex). Which is why Range Queries cannot be supported for NCIs. Even if you had an NCI with (CustomerId, InvoiceNo), the data rows will not be in that order; they will be in chronological order in the Heap; so the query that uses that NCI will extract one-row-per-NCI-entry.

b. Me: CIs are much faster than NCIs; they are far more enhanced because the NCI depends on them

MS: The B tree structure of a clustered index is no different from a non clustered index. CIs are not enhanced or somehow have a different and superior structure ...

No dispute there. You have simply misunderstood me, re speed, I was talking about the table overall (NonClusteredIndices cannot exist on their own). Let me clarify: Given the same Key, a ClusteredIndex (which includes data) is always much faster than a NonClusteredIndex-plus-Heap. Navigating, maintaining, creating, deleting from, a single data-storage structure (the CI), is obviously much faster than doing the same to two data-storage structures (NCI+Heap).

It is not physically possible to make two DSs faster than one DS (assuming with the same key.)

c. Not worth a response. It appears you do not realise that my comments pertain to the incorrect diagrams. Put another way, your comments (and proof) are also quite correct.


First , as pointed out PerfomanceDBA, in order to understand SQL Server internals it is better to use Sybase documentation and terminology.

Second, a good explanation where, why and how intermediate level consecutive transversing is explained in [1]:

  • "Read-Ahead in a key-ordered scan

    With key-ordered scans, the engine uses the information stored in the intermediate index pages 1 level above the leaf level to schedule serial read-aheads for pages that contain the keys found. If a request is made for example for all keys from 1 to 100, the engine will first read the index page above the leaf page for key 1 (on it's way to traversing to the leaf page); however, instead of simply reading each leaf page in sequence from page 1 to page 100, the engine scans the intermediate level page and builds a list of the leaf pages that must be read to get pages 1 thru 100, then schedules all reads in key order - in addition, the engine will recognize if pages are contiguous and perform a single read to retrieve the adjacent pages in a single operation as opposed to multiple, smaller operations. A similar type of operation is used to pre-fetch data from a base cluster or heap when scanning through a non-clustered index - since the leaf rows of a non-clustered index contain pointers to the data rows in the cluster/heap structure, as the storage engine reads through the leaf of the non-clustered index, it also starts scheduling asynchronous reads for the corresponding data rows whose pointers have already been retrieved. This allows the engine to fetch data efficiently from the underlying cluster/heap before it has completed the scan of the non-clustered index.

    The navigation for a read-ahead in a key-ordered scan would look something like the following:

How is sequential traversing between intermediate leaves of clustered index used?


"

[1]
Chad Boyd
MSSQLTips - SQL Server Blog
Fragmentation Station - Stop #1 - Storage basics and Access Methods http://blogs.mssqltips.com/blogs/chadboyd/archive/2007/11/12/fragmentation-station-stop-1-storage-basics-and-access-methods.aspx


The diagram you have in your question represents indexes in Microsoft SQL Server perfectly accurately.

To address some of the aspects of PerformanceDBA's answer that I believe to be incorrect or inadequately explained.

"The clustered index (and not non-clustered indices) can be used for range queries"

Incorrect: Non-clustered indices can be used perfectly well for range queries as long as the Non Clustered Index is covering.

"CIs are much faster than NCIs; they are far more enhanced because the NCI depends on them"

The B tree structure of a clustered index is no different from a non clustered index. CIs are not enhanced or somehow have a different and superior structure. If anything NCIs are slightly enhanced in that they don't always have a NULL_BITMAP and the "Status Bits B" byte and are thus may be slightly more compact.

"The intermediate levels have a single pointer to the page at the next level (not multiple pointers) ... There are no pointers to rows (at the intermediate OR leaf level)."

USE tempdb

IF OBJECT_ID('testing') IS NULL
BEGIN
    CREATE TABLE testing
    (
    a INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
    b INT NOT NULL,
    c CHAR(4000) NOT NULL DEFAULT REPLICATE('c',4000),
    d CHAR(4000) NOT NULL DEFAULT REPLICATE('d',4000)
    )

    CREATE UNIQUE NONCLUSTERED  INDEX ix ON testing (b) INCLUDE (d)

    INSERT INTO testing (b)
    SELECT TOP 3000 ROW_NUMBER() OVER (ORDER BY (SELECT 0))
    FROM sys.all_columns s1, sys.all_columns s2
END

IF OBJECT_ID('index_pages') IS NULL
BEGIN
CREATE TABLE index_pages
(
PageFID TINYINT,
PagePID INT,
IAMFID TINYINT,
IAMPID INT,
ObjectID INT,
IndexID TINYINT,
PartitionNumber TINYINT,
PartitionID BIGINT,
iam_chain_type VARCHAR(30),
PageType TINYINT,
IndexLevel TINYINT,
NextPageFID TINYINT,
NextPagePID INT,
PrevPageFID TINYINT,
PrevPagePID INT,
PRIMARY KEY (PageFID, PagePID)
) 
END
ELSE
TRUNCATE TABLE index_pages

INSERT INTO index_pages
EXEC('DBCC IND(tempdb, testing, 2)') 

SELECT * 
FROM index_pages 
ORDER BY IndexLevel DESC

You will see that the level one (intermediate level) pages have horizontal pointers denoted by the NextPagePID and PrevPagePID columns. In addition to these page level pointers each index entry has a pointer to a page in the next level down as indicated correctly in the diagram.

To see this select one of the PagePIDs belonging to a level one page and look at that page in Internals Viewer for SQL Server. You will see (as below) that each index record has its own Down Page Pointer.

In the particular individual record selected in the screen shot below it can be seen that it is showing that the first record on the leaf page 1:186 will have a key value of 13 or later.

How is sequential traversing between intermediate leaves of clustered index used?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜