SQL Query to delete oldest rows over a certain row count?
I have a table that contains log entries for a program I'm writing. I'm looking for ideas on an SQL query (I'm using SQL Server Express 2005) that will keep the newest X number of records, and delete the rest. I have a da开发者_StackOverflowtetime column that is a timestamp for the log entry.
I figure something like the following would work, but I'm not sure of the performance with the IN clause for larger numbers of records. Performance isn't critical, but I might as well do the best I can the first time.
DELETE FROM MyTable WHERE PrimaryKey NOT IN
(SELECT TOP 10,000 PrimaryKey FROM MyTable ORDER BY TimeStamp DESC)
I should mention that this query will run 3-4 times a day (as part of another process), so the number of records that will be deleted with each query will be small in comparison to the number of records that will be kept.
Try this:
DECLARE @X int
SELECT @X=COUNT(*) FROM MyTable
SET @X=@X-10000
DELETE MyTable
WHERE PrimaryKey IN (SELECT TOP(@x) PrimaryKey
FROM MyTable
ORDER BY TimeStamp ASC
)
kind of depends on if you are deleting fewer than 10,000 rows, if so this might run faster, as it identifies the rows to delete, not the rows to keep.
Try this, uses a CTE to get the row ordinal number, and then only deletes X number of rows at a time. You can alter this variable to suit your server.
Adding ReadPast table hint should prevent locking.
:
DECLARE @numberToDelete INT;
DECLARE @ROWSTOKEEP INT;
SET @ROWSTOKEEP = 50000;
SET @numberToDelete =1000;
WHILE 1=1
BEGIN
WITH ROWSTODELETE AS
(
SELECT ROW_NUMBER() OVER(ORDER BY dtsTimeStamp DESC) rn,
*
FROM MyTable
)
DELETE TOP (@numberToDelete) FROM ROWSTODELETE WITH(READPAST)
WHERE rn>@ROWSTOKEEP;
IF @@ROWCOUNT=0
BREAK;
END;
The query you have is about as efficient as it gets, and is readable.
NOT IN
and NOT EXISTS
are more efficient than LEFT JOIN/IS NULL
, but only because both columns can never be null. You can read this link for a more in-depth comparison.
This depends on your scenario (whether it's feasible for you) and how many rows you have, but there is a potentially far more optimal approach.
- Create a new copy of the log table with a new name
- Insert into the new table, the most recent 10,000 records from the original table
- Drop the original table (or rename)
- Rename the new table, to the proper name
This obviously requires more thought than just deleting rows (e.g. if the table has an IDENTITY column this needs to be set on the new table etc). But if you have a large table it would be more efficient to copy 10,000 rows to a new table then drop the original table, than trying to delete millions of rows to leave just 10,000.
DELETE FROM MyTable
WHERE TimeStamp < (SELECT min(TimeStamp)
FROM (SELECT TOP 10,000 TimeStamp
FROM MyTable
ORDER BY TimeStamp DESC))
or
DELETE FROM MyTable
WHERE TimeStamp < (SELECT min(TimeStamp)
FROM MyTable
WHERE PrimaryKey IN (SELECT TOP 10,000 TimeStamp
FROM MyTable
ORDER BY TimeStamp DESC))
Not sure if these are improvement as far as efficiency though.
精彩评论