what is the best way to delete millions of records in TSQL?
I have a following table structre
Table1 Table2 Table3
--------------------------------
sId sId sId
name x y
x1 x2 x3
I want to remove all records from table1 that do not have a matching record in the table3 based on sId and if sId present in table2 then do not delete record from tabl开发者_JAVA百科e1.Ther are about 20,15 and 10 millions records in table1,table2 & table3 resp. --I have done something like this
Delete Top (3000000)
From Table1 A
Left Join Table2 B
on A.Name ='XYZ' and
B.sId = A.sId
Left Join Table3 C
on A.Name = 'XYZ' and
C.sId = A.sId
((I have added index on sId But not on Name.)) But This takes a long time to remove records. Is there any better way to delete millions records? Thanks in advance.
do it in batches of 5000 or 10000 instead if you need to delete less than 40% of the data, if you need more then dump what you want to keep in another table/bcp out, truncate this table and insert those rows you dumped in the other table again/bcp in
while @@rowcount > 0
begin
Delete Top (5000)
From Table1 A
Left Join Table2 B
on A.Name ='XYZ' and
B.sId = A.sId
Left Join Table3 C
on A.Name = 'XYZ' and
C.sId = A.sId
end
Small example you can run to see what happens
CREATE TABLE #test(id INT)
INSERT #test VALUES(1)
INSERT #test VALUES(1)
INSERT #test VALUES(1)
INSERT #test VALUES(1)
INSERT #test VALUES(1)
INSERT #test VALUES(1)
INSERT #test VALUES(1)
WHILE @@rowcount > 0
BEGIN
DELETE TOP (2) FROM #test
END
One way to remove millions of records is to select the remaining records in new tables then drop the old tables and rename the new ones. You can choose the best way for you depending on the foreign keys you can eithe drop and recreate the foreign keys or truncate the data in the old tables and copy the selected data back.
If you need to delete just few records disregard this answer. This is if you actually want to DELETE millions of records.
One other method is to insert the data that you want to keep into another table say Table1_good. Once the is completed and verified: Drop Table1 then Rename Table1_good to Table1
Dirty way to do it but it works.
Using the top clause is more for improving concurrency and may actually make the code run slower.
One suggestion is to delete the data from a derived table: http://sqlblogcasts.com/blogs/simons/archive/2009/05/22/DELETE-TOP-x-rows-avoiding-a-table-scan.aspx
Have you set up appropriate indexes on the relevant table fields? If not it could take a long time to delete the records.
The DELETE operation you're performing is running an underlying SELECT statement to find the records that will be deleted. The operation you're doing is fundamentally a simple join. If you optimize that join, the final DELETE will be faster, too.
Make sure you have the indexes on the columns on which you're doing the joins on. Run an Execution Plan to make sure they are being used.
Once you have cleaned up the data, I would put an AFTER DELETE trigger on table3 that automatically deleted the applicable records from table1. This way you keep the data cleaned up in real time and never have to delete huge chunks.
i'd create a temp table create a seleet and populate the temp table, add indexes to the temp table and delete from my table that i want to delete records from. Then i would drop my temp table when i'm done something like this
Select * into #temp from mytable
Where blah blah(or your query)
//add contraints if you want
i would just shove the primary key into the temp table
then i would say
Delete mytable where primary key in(select myPrimarykey from #temp)
精彩评论