Are lookups on IDENTITY columns faster if the address space is contiguous?
If I have a very large table with an IDENTITY
column (bigint
) and this table is subject to de开发者_StackOverflow社区letes, will the fragmentation of the address space (the available IDs) result in slower SELECTS
?
Clarification:
By address space fragmentation I mean the gaps left in the values in the ID column, not the fragmentation on the disk that occurs when deleting rows from the table.
Obviously, this can’t be known for certain without running tests. However, I will speculate that it would not run any faster or slower if it is contiguous or fragmented.
Assuming that your column is indexed—SQL Server stores indexes in a b-tree. The nodes of this tree are designed to be of optimal size for the system’s paging system. A search within the node is going to identify the correct child-node pages regardless of the fragmentation of the indexes. Since the time it takes to load the pages is going to swamp the time it takes to search within the nodes, I don't think fragmentation will have any effect on the lookup times.
If the column is not indexed, on the other hand, then SQL Server has to do a full table scan anyway, so the distribution of values is not going to affect the time at all.
The one place that it could have an effect is on query plan optimization. SQL Server stores histograms of column contents so that it may choose a query plan that will give adequate performance. It seems possible to me that fragmentation could cause it to choose a plan that is not as good as another plan that it might have considered. This would be the case if the column statistics are not up-to-date. If the statistics are up-to-date, then the fragmentation would be detected, and the optimizer could take that information into consideration.
It depends on how the table is indexed. Assuming that you are also placing a clustered index on the column, any select statements using that column should be very fast. In the extreme, it might require some extra pages being pulled into memory if you're doing range scans, but on an ID it's pretty rare to be doing something like that. Usually you're getting those rows by seeks.
You should still have a maintenance plan that cleans up any fragmentation, especially if you are doing a lot of deletes. I have to admit that this isn't my strong suit though, so I don't know if SQL 2008 and/or your storage medium might make this unnecessary.
No.
If it is not an index column, certainly the fragmentation does not matter because anyway engine does table scan. (Depending on the actually query/join, the engine is not necessarily doing table scan but from the perspective of the column itself, engine has nothing better to do other than table scan)
If it is an indexed column, the index is stored in a tree structure that is expanded or shrunk during data insertion or deletion. One thing you need to know for this tree structure is that it is "fragmented" even if you insert data sequentially. The fragmentation here is not in the sense of disk allocation unit, but each node in the tree is not fully used for the data range it covers. The intended fragmentation is to avoid too frequent tree restructuring. The engine uses an occupancy ratio when it restructures a index tree (which can be specified during the creation of an index). So, no matter if the id is continuous or not, it is stored in a bigger storage space with some "gaps" in it. Deletion on the column should not create any noticeable difference on performance.
However, database also allocates storage space by pages, which means there will be fragmentation and related performance issue. But to your question, this has nothing to do with the deletion on the id column.
精彩评论