开发者

Interpreting and finetuning the BATCHSIZE parameter?

So I am playing around with the BULK INSERT statement and am beginning to love it. What was talking the SQL Server Import/Export Wizard开发者_如何学Python 7 hours is taking only 1-3 hours using BULK INSERT. However, what I am observing is that the time to completion is heavily dependent on the BATCHSIZE specification.

Following are the times I observed for a 5.7 GB file containing 50 million records:

BATCHSIZE = 50000,  Time Taken: 17.30 mins
BATCHSIZE = 10000,  Time Taken: 14:00 mins
BATCHSIZE = 5000 ,  Time Taken: 15:00 mins

This only makes me curious: Is it possible to determine a good number for BATCHSIZE and if so, what factors does it depend on and can it be approximated without having to run the same query tens of times?

My next run would be a 70 GB file containing 780 million records. Any suggestions would be appreciated? I will report back the results once I finish.


There is some information here and it appears the batch size should be as large as is practical; documentation states in general the larger the batch size the better the performance, but you are not experiencing that at all. Seems that 10k is a good batch size to start with, but I would look at optimizing the bulk insert from other angles such as putting the database into simple mode or specifying a tablock hint during your import race.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜