开发者

Bulk Copy from small table to larger one in SQL Server 2005

I'm a newbie in SQL Server and have the following dilemma:

I have two tables with the same structure. Call it runningTbl and finalTbl.

runningTbl contains about 600 000 to 1 million rows every 15 minutes.

After doing some data cleanup in runningTbl I want to move all the records to finalTbl. finalTbl currently has about 38 million rows.

The above process needs to be repeated every 15-20 minutes.

The problem is that the moving of data from runningTbl to finalTbl is taking way longer than 20 minutes at times..

Initially when the tables were small it took anything from 10 seconds to 2 minutes to copy.

Now it just takes too long.

Any on开发者_C百科e that can assist with this? SQL query to follow..

Thanks


There are a number of things that you will need to do in order to get the most efficient method of copying the data. So far you are on the right track but you have a long way to go. I would suggest you first look at your indexes. There may be optimizations there that can help. Next, make sure you don't have triggers on this table that could cause a slowdown. Next, change the logging level (if that is permutable).

There is a bunch more help here (from Microsoft):

http://msdn.microsoft.com/en-us/library/ms190421(v=SQL.90).aspx

Basically you are on the right track using BCP. This is actually Microsoft's recommendation:

To bulk-copy data from one instance of SQL Server to another, use bcp to export the table data into a data file. Then use one of the bulk import methods to import the data from the file to a table. Perform both the bulk export and bulk import operations using either native or Unicode native format.

When you do this though, you need to also consider the possibility of dropping your indexes if there is too much data being brought in (based upon the type of index you use). If you use a clustered index, it may also be a good idea to order your data before import. Here is more information (including the source of the above quote):

http://msdn.microsoft.com/en-US/library/ms177445(v=SQL.90).aspx


For starters : one of the things I've learned over the years is that MSSQL does a great job at optimizing all kinds of operations but to do so heavily relies on the statistics for all tables involved. Hence, I would suggest to run "UPDATE STATISTICS processed_logs" & "UPDATE STATISTICS unprocessed_logs" before running the actual inserts; even on a large table these things don't take all that long. Apart from that, based on the query above, a lot depends on the indexes of the target table. I'm assuming the target table has its clustered index (or PRIMARY KEY) on (at least) UnixTime, if not you'll create major data-fragmentation when you squeeze more and more data in-between the already existing records. To work around this you could try defragmenting the target table once in a while (can be done online, but takes a long time), but making the clustered index (or PK) so that data is always appended to the end of the table would be the better approach; well, at least in my opinion.


I suggest that you should have a window service and use timer and a boolean variable. Once your request is sent to server set the bool to high bit and the timer event should not execute code until the bit is low.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜