开发者

Considerations when dropping columns in large tables

I have a table of call data that has grown to 1.3 billion rows and 173 gigabytes of data There are two columns that we no longer use, one is char(15) and the other is varchar(24). They have both been getting inserted with NULL for some time, I've been putting off removing the columns because I am unsure of the implications. We have limited space on both the drive with the database and the drive with the transaction log.

In a开发者_开发百科ddition I found this post saying the space would not be available until a DBCC REINDEX was done. I see this as both good and bad. It's good because dropping the columns should be very fast and not involve a lot of logging, but bad because the space will not be reclaimed. Will newly inserted records take up less space though? That would be fine in my case as we prune the old data after 18 months so the space will gradually decrease.

If we did a DBCC REINDEX (or ALTER INDEX REBUILD) would that actually help since the columns are not part of any index? Would that take up log space or lock the table so it could not be used?


I found your question interesting, so decided to model it on a development database. SQL Server 2008, database size 400 Mb, log 2.4 Gb. I assume, from link provided you created a table with clustered index:

CREATE TABLE [dbo].[big_table](
    [recordID] [int] IDENTITY(1,1) NOT NULL,
    [col1] [varchar](50) NOT NULL,
    [col2] [char](15) NULL,
    [col3] [varchar](24) NULL,
 CONSTRAINT [PK_big_table] PRIMARY KEY CLUSTERED 
(
    [recordID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY] 

This table consist of 12 Million records.

sp_spaceused big_table, true

name-big_table, rows-12031303, reserved-399240 KB, data-397760 KB, index_size-1336 KB, unused-144 KB. 

drop columns

sp_spaceused big_table, true

Table size stays the same. Database and log size remained the same.

add 3 million of rows to the rest of the table

name-big_table, rows-15031303, reserved-511816 KB, data-509904 KB, index_size-1752 KB, unused-160 KB.

database size 500 Mb, log 3.27 Gb.

After

DBCC DBREINDEX( big_table )

Log is the same size, but database size increased to 866 Mb

name-big_table, rows-12031303, reserved-338376 KB, data-337704  KB, index_size-568 KB, unused-104 KB. 

Again add 3 million rows to see if they going into available space within database. Database size is the same, log 3.96 Gb, which clearly shows they are.

Hope it makes sense.


No, newly inserted records would not take up less space. I was looking at this exact issue earlier today as it happens.

Test table

CREATE TABLE T
(
id int identity primary key,
FixedWidthColToBeDropped char(10),
VariableWidthColToBeDropped varchar(10),
FixedWidthColToBeWidened char(7),
FixedWidthColToBeShortened char(20),
VariableWidthColToBeWidened varchar(7),
VariableWidthColToBeShortened varchar(20),
VariableWidthColWontBeAltered varchar(20)
)

Offsets Query

WITH T
     AS (SELECT ISNULL(LEFT(MAX(name), 30), 'Dropped')  AS column_name,
                MAX(column_id)                          AS column_id,
                ISNULL(MAX(case
                             when column_id IS NOT NULL THEN max_inrow_length
                           END), MAX(max_inrow_length)) AS max_inrow_length,
                leaf_offset,
                CASE
                  WHEN leaf_offset < 0 THEN SUM(CASE
                                                  WHEN column_id IS NULL THEN 2 ELSE 0
                                                END)
                  ELSE MAX(max_inrow_length) - MAX(CASE
                       WHEN column_id IS NULL THEN 0
                       ELSE max_inrow_length
                                                   END)
                END                                     AS wasted_space
         FROM   sys.system_internals_partition_columns pc
                JOIN sys.partitions p
                  ON p.partition_id = pc.partition_id
                LEFT JOIN sys.columns c
                  ON column_id = partition_column_id
                     AND c.object_id = p.object_id
         WHERE  p.object_id = object_id('T')
         GROUP  BY leaf_offset)
SELECT CASE
         WHEN GROUPING(column_name) = 0 THEN column_name
         ELSE 'Total'
       END               AS column_name,
       column_id,
       max_inrow_length,
       leaf_offset,
       SUM(wasted_space) AS wasted_space
FROM   T  
GROUP  BY ROLLUP ((column_name,
                   column_id,
                   max_inrow_length,
                   leaf_offset))
ORDER  BY GROUPING(column_name),
          CASE
            WHEN leaf_offset > 0 THEN leaf_offset
            ELSE 10000 - leaf_offset
          END  

Initial State of the Table

column_name                    column_id   max_inrow_length leaf_offset wasted_space
------------------------------ ----------- ---------------- ----------- ------------
id                             1           4                4           0
FixedWidthColToBeDropped       2           10               8           0
FixedWidthColToBeWidened       4           7                18          0
FixedWidthColToBeShortened     5           20               25          0
VariableWidthColToBeDropped    3           10               -1          0
VariableWidthColToBeWidened    6           7                -2          0
VariableWidthColToBeShortened  7           20               -3          0
VariableWidthColWontBeAltered  8           20               -4          0
Total                          NULL        NULL             NULL        0

Now make some changes

ALTER TABLE T 
ALTER COLUMN FixedWidthColToBeWidened char(12)

ALTER TABLE T 
ALTER COLUMN FixedWidthColToBeShortened char(10)

ALTER TABLE T 
ALTER COLUMN VariableWidthColToBeWidened varchar(12)

ALTER TABLE T 
ALTER COLUMN VariableWidthColToBeShortened varchar(10)

ALTER TABLE T 
DROP COLUMN FixedWidthColToBeDropped, VariableWidthColToBeDropped

Look at the table again

column_name                    column_id   max_inrow_length leaf_offset wasted_space
------------------------------ ----------- ---------------- ----------- ------------
id                             1           4                4           0
Dropped                        NULL        10               8           10
Dropped                        NULL        7                18          7
FixedWidthColToBeShortened     5           10               25          10
FixedWidthColToBeWidened       4           12               45          0
Dropped                        NULL        10               -1          2
VariableWidthColToBeWidened    6           12               -2          0
Dropped                        NULL        20               -3          2
VariableWidthColWontBeAltered  8           20               -4          0
VariableWidthColToBeShortened  7           10               -5          0
Total                          NULL        NULL             NULL        31

Insert a row and look at the page

INSERT INTO T
           ([FixedWidthColToBeWidened]
           ,[FixedWidthColToBeShortened]
           ,[VariableWidthColToBeWidened]
           ,[VariableWidthColToBeShortened])
     VALUES
           ('1','2','3','4')

DECLARE @DBCCPAGE nvarchar(100)

SELECT TOP 1  @DBCCPAGE = 'DBCC PAGE(''tempdb'',' + CAST(file_id AS VARCHAR) + ',' + CAST(page_id AS VARCHAR) + ',3)'
FROM T
CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%) 

DBCC TRACEON(3604)
EXEC (@DBCCPAGE)    

Returns

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 75                     
Memory Dump @0x000000000D5CA060

0000000000000000:   30003900 01000000 26a44500 00000000 †0.9.....&¤E..... 
0000000000000010:   ffffffff ffffff7f 00322020 20202020 †ÿÿÿÿÿÿÿ..2       
0000000000000020:   20202003 00000000 98935c0d 00312020 †   ......\..1   
0000000000000030:   20202020 20202020 200a0080 00050049 †         ......I 
0000000000000040:   004a004a 004a004b 003334†††††††††††††.J.J.J.K.34      

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4

id = 1                               

Slot 0 Column 67108868 Offset 0x8 Length 0 Length (physical) 10

DROPPED = NULL                       

Slot 0 Column 67108869 Offset 0x0 Length 0 Length (physical) 0

DROPPED = NULL                       

Slot 0 Column 67108865 Offset 0x12 Length 0 Length (physical) 7

DROPPED = NULL                       

Slot 0 Column 67108866 Offset 0x19 Length 0 Length (physical) 20

DROPPED = NULL                       

Slot 0 Column 6 Offset 0x49 Length 1 Length (physical) 1

VariableWidthColToBeWidened = 3      

Slot 0 Column 67108867 Offset 0x0 Length 0 Length (physical) 0

DROPPED = NULL                       

Slot 0 Column 8 Offset 0x0 Length 0 Length (physical) 0

VariableWidthColWontBeAltered = [NULL]                                    

Slot 0 Column 4 Offset 0x2d Length 12 Length (physical) 12

FixedWidthColToBeWidened = 1                                              

Slot 0 Column 5 Offset 0x19 Length 10 Length (physical) 10

FixedWidthColToBeShortened = 2                                            

Slot 0 Column 7 Offset 0x4a Length 1 Length (physical) 1

VariableWidthColToBeShortened = 4    

Slot 0 Offset 0x0 Length 0 Length (physical) 0

KeyHashValue = (010086470766)      

You can see the dropped (and altered) columns are still consuming space even though the table was actually empty when the schema was changed.

The impact of the dropped columns in your case will be 15 bytes wasted for the char one and 2 bytes for the varchar one unless it is the last column in the variable section when it will take up no space.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜