开发者

SQL Query to delete oldest rows over a certain row count?

I have a table that contains log entries for a program I'm writing. I'm looking for ideas on an SQL query (I'm using SQL Server Express 2005) that will keep the newest X number of records, and delete the rest. I have a da开发者_StackOverflowtetime column that is a timestamp for the log entry.

I figure something like the following would work, but I'm not sure of the performance with the IN clause for larger numbers of records. Performance isn't critical, but I might as well do the best I can the first time.

DELETE FROM MyTable WHERE PrimaryKey NOT IN
(SELECT TOP 10,000 PrimaryKey FROM MyTable ORDER BY TimeStamp DESC)

I should mention that this query will run 3-4 times a day (as part of another process), so the number of records that will be deleted with each query will be small in comparison to the number of records that will be kept.


Try this:

DECLARE @X int
SELECT @X=COUNT(*) FROM MyTable 
SET @X=@X-10000

DELETE MyTable 
WHERE PrimaryKey IN (SELECT TOP(@x) PrimaryKey 
                     FROM MyTable 
                     ORDER BY TimeStamp ASC
                    ) 

kind of depends on if you are deleting fewer than 10,000 rows, if so this might run faster, as it identifies the rows to delete, not the rows to keep.


Try this, uses a CTE to get the row ordinal number, and then only deletes X number of rows at a time. You can alter this variable to suit your server.

Adding ReadPast table hint should prevent locking.

:

DECLARE @numberToDelete INT;
DECLARE @ROWSTOKEEP INT;
SET @ROWSTOKEEP = 50000;
SET @numberToDelete =1000;

WHILE 1=1
BEGIN
    WITH ROWSTODELETE AS
    (
        SELECT ROW_NUMBER() OVER(ORDER BY dtsTimeStamp DESC) rn,
            *
        FROM MyTable 

    )
    DELETE TOP (@numberToDelete) FROM ROWSTODELETE WITH(READPAST)
    WHERE rn>@ROWSTOKEEP;

    IF @@ROWCOUNT=0
        BREAK;
END;


The query you have is about as efficient as it gets, and is readable.

NOT IN and NOT EXISTS are more efficient than LEFT JOIN/IS NULL, but only because both columns can never be null. You can read this link for a more in-depth comparison.


This depends on your scenario (whether it's feasible for you) and how many rows you have, but there is a potentially far more optimal approach.

  1. Create a new copy of the log table with a new name
  2. Insert into the new table, the most recent 10,000 records from the original table
  3. Drop the original table (or rename)
  4. Rename the new table, to the proper name

This obviously requires more thought than just deleting rows (e.g. if the table has an IDENTITY column this needs to be set on the new table etc). But if you have a large table it would be more efficient to copy 10,000 rows to a new table then drop the original table, than trying to delete millions of rows to leave just 10,000.


DELETE FROM MyTable 
WHERE TimeStamp < (SELECT min(TimeStamp) 
                   FROM (SELECT TOP 10,000 TimeStamp 
                         FROM MyTable 
                         ORDER BY TimeStamp DESC))

or

DELETE FROM MyTable 
WHERE TimeStamp < (SELECT min(TimeStamp) 
                   FROM MyTable 
                   WHERE PrimaryKey IN (SELECT TOP 10,000 TimeStamp 
                                        FROM MyTable 
                                        ORDER BY TimeStamp DESC))

Not sure if these are improvement as far as efficiency though.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜