Considerations when dropping columns in large tables
I have a table of call data that has grown to 1.3 billion rows and 173 gigabytes of data There are two columns that we no longer use, one is char(15) and the other is varchar(24). They have both been getting inserted with NULL for some time, I've been putting off removing the columns because I am unsure of the implications. We have limited space on both the drive with the database and the drive with the transaction log.
In a开发者_开发百科ddition I found this post saying the space would not be available until a DBCC REINDEX was done. I see this as both good and bad. It's good because dropping the columns should be very fast and not involve a lot of logging, but bad because the space will not be reclaimed. Will newly inserted records take up less space though? That would be fine in my case as we prune the old data after 18 months so the space will gradually decrease.
If we did a DBCC REINDEX (or ALTER INDEX REBUILD) would that actually help since the columns are not part of any index? Would that take up log space or lock the table so it could not be used?
I found your question interesting, so decided to model it on a development database. SQL Server 2008, database size 400 Mb, log 2.4 Gb. I assume, from link provided you created a table with clustered index:
CREATE TABLE [dbo].[big_table](
[recordID] [int] IDENTITY(1,1) NOT NULL,
[col1] [varchar](50) NOT NULL,
[col2] [char](15) NULL,
[col3] [varchar](24) NULL,
CONSTRAINT [PK_big_table] PRIMARY KEY CLUSTERED
(
[recordID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
This table consist of 12 Million records.
sp_spaceused big_table, true
name-big_table, rows-12031303, reserved-399240 KB, data-397760 KB, index_size-1336 KB, unused-144 KB.
drop columns
sp_spaceused big_table, true
Table size stays the same. Database and log size remained the same.
add 3 million of rows to the rest of the table
name-big_table, rows-15031303, reserved-511816 KB, data-509904 KB, index_size-1752 KB, unused-160 KB.
database size 500 Mb, log 3.27 Gb.
After
DBCC DBREINDEX( big_table )
Log is the same size, but database size increased to 866 Mb
name-big_table, rows-12031303, reserved-338376 KB, data-337704 KB, index_size-568 KB, unused-104 KB.
Again add 3 million rows to see if they going into available space within database. Database size is the same, log 3.96 Gb, which clearly shows they are.
Hope it makes sense.
No, newly inserted records would not take up less space. I was looking at this exact issue earlier today as it happens.
Test table
CREATE TABLE T
(
id int identity primary key,
FixedWidthColToBeDropped char(10),
VariableWidthColToBeDropped varchar(10),
FixedWidthColToBeWidened char(7),
FixedWidthColToBeShortened char(20),
VariableWidthColToBeWidened varchar(7),
VariableWidthColToBeShortened varchar(20),
VariableWidthColWontBeAltered varchar(20)
)
Offsets Query
WITH T
AS (SELECT ISNULL(LEFT(MAX(name), 30), 'Dropped') AS column_name,
MAX(column_id) AS column_id,
ISNULL(MAX(case
when column_id IS NOT NULL THEN max_inrow_length
END), MAX(max_inrow_length)) AS max_inrow_length,
leaf_offset,
CASE
WHEN leaf_offset < 0 THEN SUM(CASE
WHEN column_id IS NULL THEN 2 ELSE 0
END)
ELSE MAX(max_inrow_length) - MAX(CASE
WHEN column_id IS NULL THEN 0
ELSE max_inrow_length
END)
END AS wasted_space
FROM sys.system_internals_partition_columns pc
JOIN sys.partitions p
ON p.partition_id = pc.partition_id
LEFT JOIN sys.columns c
ON column_id = partition_column_id
AND c.object_id = p.object_id
WHERE p.object_id = object_id('T')
GROUP BY leaf_offset)
SELECT CASE
WHEN GROUPING(column_name) = 0 THEN column_name
ELSE 'Total'
END AS column_name,
column_id,
max_inrow_length,
leaf_offset,
SUM(wasted_space) AS wasted_space
FROM T
GROUP BY ROLLUP ((column_name,
column_id,
max_inrow_length,
leaf_offset))
ORDER BY GROUPING(column_name),
CASE
WHEN leaf_offset > 0 THEN leaf_offset
ELSE 10000 - leaf_offset
END
Initial State of the Table
column_name column_id max_inrow_length leaf_offset wasted_space
------------------------------ ----------- ---------------- ----------- ------------
id 1 4 4 0
FixedWidthColToBeDropped 2 10 8 0
FixedWidthColToBeWidened 4 7 18 0
FixedWidthColToBeShortened 5 20 25 0
VariableWidthColToBeDropped 3 10 -1 0
VariableWidthColToBeWidened 6 7 -2 0
VariableWidthColToBeShortened 7 20 -3 0
VariableWidthColWontBeAltered 8 20 -4 0
Total NULL NULL NULL 0
Now make some changes
ALTER TABLE T
ALTER COLUMN FixedWidthColToBeWidened char(12)
ALTER TABLE T
ALTER COLUMN FixedWidthColToBeShortened char(10)
ALTER TABLE T
ALTER COLUMN VariableWidthColToBeWidened varchar(12)
ALTER TABLE T
ALTER COLUMN VariableWidthColToBeShortened varchar(10)
ALTER TABLE T
DROP COLUMN FixedWidthColToBeDropped, VariableWidthColToBeDropped
Look at the table again
column_name column_id max_inrow_length leaf_offset wasted_space
------------------------------ ----------- ---------------- ----------- ------------
id 1 4 4 0
Dropped NULL 10 8 10
Dropped NULL 7 18 7
FixedWidthColToBeShortened 5 10 25 10
FixedWidthColToBeWidened 4 12 45 0
Dropped NULL 10 -1 2
VariableWidthColToBeWidened 6 12 -2 0
Dropped NULL 20 -3 2
VariableWidthColWontBeAltered 8 20 -4 0
VariableWidthColToBeShortened 7 10 -5 0
Total NULL NULL NULL 31
Insert a row and look at the page
INSERT INTO T
([FixedWidthColToBeWidened]
,[FixedWidthColToBeShortened]
,[VariableWidthColToBeWidened]
,[VariableWidthColToBeShortened])
VALUES
('1','2','3','4')
DECLARE @DBCCPAGE nvarchar(100)
SELECT TOP 1 @DBCCPAGE = 'DBCC PAGE(''tempdb'',' + CAST(file_id AS VARCHAR) + ',' + CAST(page_id AS VARCHAR) + ',3)'
FROM T
CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%)
DBCC TRACEON(3604)
EXEC (@DBCCPAGE)
Returns
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Record Size = 75
Memory Dump @0x000000000D5CA060
0000000000000000: 30003900 01000000 26a44500 00000000 †0.9.....&¤E.....
0000000000000010: ffffffff ffffff7f 00322020 20202020 †ÿÿÿÿÿÿÿ..2
0000000000000020: 20202003 00000000 98935c0d 00312020 † ......\..1
0000000000000030: 20202020 20202020 200a0080 00050049 † ......I
0000000000000040: 004a004a 004a004b 003334†††††††††††††.J.J.J.K.34
Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
id = 1
Slot 0 Column 67108868 Offset 0x8 Length 0 Length (physical) 10
DROPPED = NULL
Slot 0 Column 67108869 Offset 0x0 Length 0 Length (physical) 0
DROPPED = NULL
Slot 0 Column 67108865 Offset 0x12 Length 0 Length (physical) 7
DROPPED = NULL
Slot 0 Column 67108866 Offset 0x19 Length 0 Length (physical) 20
DROPPED = NULL
Slot 0 Column 6 Offset 0x49 Length 1 Length (physical) 1
VariableWidthColToBeWidened = 3
Slot 0 Column 67108867 Offset 0x0 Length 0 Length (physical) 0
DROPPED = NULL
Slot 0 Column 8 Offset 0x0 Length 0 Length (physical) 0
VariableWidthColWontBeAltered = [NULL]
Slot 0 Column 4 Offset 0x2d Length 12 Length (physical) 12
FixedWidthColToBeWidened = 1
Slot 0 Column 5 Offset 0x19 Length 10 Length (physical) 10
FixedWidthColToBeShortened = 2
Slot 0 Column 7 Offset 0x4a Length 1 Length (physical) 1
VariableWidthColToBeShortened = 4
Slot 0 Offset 0x0 Length 0 Length (physical) 0
KeyHashValue = (010086470766)
You can see the dropped (and altered) columns are still consuming space even though the table was actually empty when the schema was changed.
The impact of the dropped columns in your case will be 15
bytes wasted for the char
one and 2 bytes for the varchar
one unless it is the last column in the variable section when it will take up no space.
精彩评论