bulk insert from csv to DB using Linq to sql
I have implemented an application which taking the data from csv file and inserting into Sql DB, i am using Linq to sql. also i have requirement of skiping those records whih are having some validation, to achive this i used a loop and inside the loop, calling submitchnages().
problem: this app is working for less number of records(<100) but in reality ill be getting csv file of more 3 - 4 lacs of record. I just simply ran my app against these big files, in result the app is开发者_如何学Python taking long time (5 -6 hrs).
Please suggest any better approach.
Linq-to-SQL is great for getting data OUT of the database, or for validation and a small handful of inserts/updates at once. But for what you're doing (ETL), it sounds like you need to look into the SqlBulkCopy
object. Go ahead and use your L2S objects to do the validation, but then instead of submitting the changes, just map the objects into a good old fashioned ADO.NET DataTable and the every 1000 records or so, bulk insert them.
If performance is a big concern, LINQ to SQL might not be the tool for the job. However, before tossing LINQ to SQL out the door for your solution, you might consider the following:
- Try creating a new
DataContext
after a certain number of records. TheDataContext
caches all entities so send to and retrieve from the database, which will lead to a large memory footprint and eventually... out of memory. - Use the SQL Profiler to see what queries LINQ to SQL sends to the database. Possibly, LINQ to SQL also queries the database for each entity you create.
- Try to tune the database on inserts. This might be difficult, but you can try writing those records to an intermediate table (with less dependencies) and use a stored procedure to move the data to it's final destination.
Bulk inserts is something O/RMs are not good at, so you might need to take a different approach.
If you have to do the inserts using Linq2Sql, you may want to do intermittent commits. Something like this -
public void LoadLargeDataUsingLinqToSql(string pathToCSV){
DataTable dt = LoadMyCSVToDataTable(pathToCSV);
int myPerformanceCounter = 0;
foreach(DataRow dr in dt.Rows()){
MyLinqClass m = ConvertDRToMyLinqClass(dr);
if(m.IsValidAndReadyToBeSaved()){
MyDataContext.MyLinqClassRef.InsertOnSubmit(m);
myPerformanceCounter++;
}
if(myPerformaceCounter>25000){
//Commit to clear cache.
MyDataContext.SubmitChanges();
myPerformanceCounter=0;
}
}
//Commit leftovers
MyDataContext.SubmitChanges();
}
精彩评论