SqlBulkCopy slow as molasses
I'm looking for the fastest way to load bulk data via c#. I have this script that does the job but slow. I read testimonies that SqlBulkCopy is the fastest.
1000 records 2.5 seconds. files contain anywhere near 5000 records to 250k What are some of the things that can slow it down?T开发者_StackOverflow中文版able Def:
CREATE TABLE [dbo].[tempDispositions](
[QuotaGroup] [varchar](100) NULL,
[Country] [varchar](50) NULL,
[ServiceGroup] [varchar](50) NULL,
[Language] [varchar](50) NULL,
[ContactChannel] [varchar](10) NULL,
[TrackingID] [varchar](20) NULL,
[CaseClosedDate] [varchar](25) NULL,
[MSFTRep] [varchar](50) NULL,
[CustEmail] [varchar](100) NULL,
[CustPhone] [varchar](100) NULL,
[CustomerName] [nvarchar](100) NULL,
[ProductFamily] [varchar](35) NULL,
[ProductSubType] [varchar](255) NULL,
[CandidateReceivedDate] [varchar](25) NULL,
[SurveyMode] [varchar](1) NULL,
[SurveyWaveStartDate] [varchar](25) NULL,
[SurveyInvitationDate] [varchar](25) NULL,
[SurveyReminderDate] [varchar](25) NULL,
[SurveyCompleteDate] [varchar](25) NULL,
[OptOutDate] [varchar](25) NULL,
[SurveyWaveEndDate] [varchar](25) NULL,
[DispositionCode] [varchar](5) NULL,
[SurveyName] [varchar](20) NULL,
[SurveyVendor] [varchar](20) NULL,
[BusinessUnitName] [varchar](25) NULL,
[UploadId] [int] NULL,
[LineNumber] [int] NULL,
[BusinessUnitSubgroup] [varchar](25) NULL,
[FileDate] [datetime] NULL
) ON [PRIMARY]
and here's the code
private void BulkLoadContent(DataTable dt)
{
OnMessage("Bulk loading records to temp table");
OnSubMessage("Bulk Load Started");
using (SqlBulkCopy bcp = new SqlBulkCopy(conn))
{
bcp.DestinationTableName = "dbo.tempDispositions";
bcp.BulkCopyTimeout = 0;
foreach (DataColumn dc in dt.Columns)
{
bcp.ColumnMappings.Add(dc.ColumnName, dc.ColumnName);
}
bcp.NotifyAfter = 2000;
bcp.SqlRowsCopied += new SqlRowsCopiedEventHandler(bcp_SqlRowsCopied);
bcp.WriteToServer(dt);
bcp.Close();
}
}
Do you have any indexes, triggers or constraints on that table?
That will cause slowdowns on insert - especially a clustered index would hurt. When blasting the amounts of data you're doing, it's best to drop indexes first, and re-apply them afterwards.
A good post about it is here: What's the fastest way to bulk insert a lot of data in SQL Server (C# client)
If you have lots of data, setting the batchsize to a reasonably large number might help:
bcp.BatchSize = 10000;
Things that can slow down the bulk copy : -Full text indexes on the table -Triggers on Insert -Foreign-Key constraints
I've noticed that trying to flush large datasets is initially much faster, but slows down substantially over time. I've found a modest increase in performance using a buffered approach, feeding bulkcopy just a few thousand records at a time under the same connection. It seems to keep the per-batch transaction time down over time, which (over time), improves performance. On my solution, I've noted that the same method un-buffered will save about 5,000,000 records in the time it takes this method to save about 7,500,000 records of the same type to the same DB. Hope this helps someone.
public void flush_DataTable(DataTable dt, string tableName)//my incoming DTs have a million or so each and slow down over time to nothing. This helps.
{ int bufferSize = 10000;
int bufferHigh = bufferSize;
int lowBuffer = 0;
if (dt.Rows.Count >= bufferSize)
{ using (SqlConnection conn = getConn())
{ conn.Open();
while (bufferHigh < dt.Rows.Count)
{
using (SqlBulkCopy s = new SqlBulkCopy(conn))
{ s.BulkCopyTimeout = 900;
s.DestinationTableName = tableName;
s.BatchSize = bufferSize;
s.EnableStreaming = true;
foreach (var column in dt.Columns)
s.ColumnMappings.Add(column.ToString(), column.ToString());
DataTable bufferedTable = dt.Clone();
for (int bu = lowBuffer; bu < bufferHigh; bu++)
{
bufferedTable.ImportRow(dt.Rows[bu]);
}
s.WriteToServer(bufferedTable);
if (bufferHigh == dt.Rows.Count)
{
break;
}
lowBuffer = bufferHigh;
bufferHigh += bufferSize;
if (bufferHigh > dt.Rows.Count)
{
bufferHigh = dt.Rows.Count;
}
}
}
conn.Close();
}
}
else
{
flushDataTable(dt, tableName);//perofrm a non-buffered flush (could just as easily flush the buffer here bu I already had the other method
}
}
Setting BatchSize or NotifyAfter will both cause a slow down.
If you have a data reader, you can wrap it with a row-counting data reader to get the progress.
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection))
{
bulkCopy.EnableStreaming = true;
bulkCopy.BulkCopyTimeout = 0;
using (var reader = new MyRowCountDataReader(dataReader))
{
reader.NotifyAfter = 1000;
reader.RowRead += (sender, args) => { ... };
bulkCopy.WriteToServer(reader);
}
}
I just had a similar experience. SqlBulkCopy
was fine with one table, but with another it was at least 10x slower to copy the same number of records.
The slow table did have non-clustered indexes and a trigger, but disabling them didn't make any appreciable difference.
It turns out the slow table had six NVARCHAR(MAX)
columns. The data I was actually inserting was all fairly short, so I changed the columns to NVARCHAR(500)
.
Et voilà! The performance of the slow table improved to match the other table - at least a 10x speed improvement.
The IDataReader implementation I sugested here How to implement IDataReader? maybe helps you. I used it with SqlBulkCopy as follows:
using (MyFileDataReader reader = new MyFileDataReader(@"C:\myfile.txt"))
{
SqlBulkCopy bulkCopy = new SqlBulkCopy(connection);
bulkCopy.DestinationTableName = "[my_table]";
bulkCopy.BatchSize = 10000;
bulkCopy.WriteToServer(reader);
bulkCopy.Close();
}
精彩评论