开发者

How do I speed up DbSet.Add()?

I have to import about 30k rows from a CSV file to my SQL database, this sadly takes 20 minutes.

Troubleshooting with a profiler shows me that DbSet.Add is taking the most time, but why?

I have these Entity Framework Code-First classes:

public class Article
{
    // About 20 properties, each property doesn't store excessive amounts of data
}

public class Database : DbContext
{
    public DbSet<Article> Articles { get; set; }
}

For each item in my for loop I do:

db.Articles.Add(article);

Outside the for loop I do:

db.SaveChanges();

It's connected with my local SQLExpress server, but I guess there isn't anything written till SaveChanges is being called so I guess开发者_JAVA百科 the server won't be the problem....


As per Kevin Ramen's comment (Mar 29) I can confirm that setting db.Configuration.AutoDetectChangesEnabled = false makes a huge difference in speed

Running Add() on 2324 items by default ran 3min 15sec on my machine, disabling the auto-detection resulted in the operation completing in 0.5sec.

http://blog.larud.net/archive/2011/07/12/bulk-load-items-to-a-ef-4-1-code-first-aspx


I'm going to add to Kervin Ramen's comment by saying that if you are only doing inserts (no updates or deletes) then you can, in general, safely set the following properties before doing any inserts on the context:

DbContext.Configuration.AutoDetectChangesEnabled = false;
DbContext.Configuration.ValidateOnSaveEnabled = false;

I was having a problem with a once-off bulk import at my work. Without setting the above properties, adding about 7500 complicated objects to the context was taking over 30 minutes. Setting the above properties (so disabling EF checks and change tracking) reduced the import down to seconds.

But, again, I stress only use this if you are doing inserts. If you need to mix inserts with updates/deletes you can split your code into two paths and disable the EF checks for the insert part and then re-enable the checks for the update/delete path. I have used this approach succesfully to get around the slow DbSet.Add() behaviour.


Each item in a unit-of-work has overhead, as it must check (and update) the identity manager, add to various collections, etc.

The first thing I would try is batching into, say, groups of 500 (change that number to suit), starting with a fresh (new) object-context each time - as otherwise you can reasonably expect telescoping performance. Breaking it into batches also prevents a megalithic transaction bringing everything to a stop.

Beyond that; SqlBulkCopy. It is designed for large imports with minimal overhead. It isn't EF though.


There is an extremely easy to use and very fast extension here: https://efbulkinsert.codeplex.com/

It's called "Entity Framework Bulk Insert".

Extension itself is in namespace EntityFramework.BulkInsert.Extensions. So to reveal the extension method add using

using EntityFramework.BulkInsert.Extensions;

And then you can do this

context.BulkInsert(entities);

BTW - If you do not wish to use this extension for some reason, you could also try instead of running db.Articles.Add(article) for each article, to create each time a list of several articles and then use AddRange (new in EF version 6, along with RemoveRange) to add them together to the dbcontext.


I haven't really tried this, but my logic would be to hold on to ODBC driver to load file into datatable and then to use sql stored procedure to pass table to procedure.

For the first part, try: http://www.c-sharpcorner.com/UploadFile/mahesh/AccessTextDb12052005071306AM/AccessTextDb.aspx

For the second part try this for SQL procedure: http://www.builderau.com.au/program/sqlserver/soa/Passing-table-valued-parameters-in-SQL-Server-2008/0,339028455,339282577,00.htm

And create SqlCommnand object in c# and add to its Parameters collection SqlParameter that is SqlDbType.Structured

Well, I hope it helps.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜