Fastest and best method for importing file contents into a database
I have folders where approx 3000 new csv files come in on a daily basis, each containing between 50 and 2000 lines of information.
Currently, there is a process in place which picks these files up one at a time and takes e开发者_C百科ach line one at a time and sends it to a stored procedure to insert the contents into a database.
This means that over the course of a day, it can struggle to get through the 3000 files before the next 3000 come in!
I'm looking to improve this process and had the following ideas
- Use new Parallel feature of C# 4.0 to allow multiple files to be processed at once, still passing through the lines one by one to the stored proc
- Create a new temporary database table where all the rows in the file can be inserted into at once then call the stored procedure on the newly added rows in the temp table.
- Split the process into 2 tasks. One job to read data from the files into the temporary database table, the other to process the rows in the temporary table.
Any other ideas on how I could look at doing this? Currently it can take up to 20 seconds per file, I'd really like to improve performance on this considerably.
SQL Server Bulk Insert might be just what you need
http://msdn.microsoft.com/en-us/library/ms188365.aspx
Another issue you may be seeing with all of those inserts taking a long time is every time a row is added, your table may be getting reindexed. A search like this will give lots of good articles on ways to maybe get better performance out of your current procedure http://www.google.com/search?q=sql+insert+performance
You can use SQL Server native BCP utility.
More info about BCP utility can be found here: Importing and Exporting Bulk Data by Using the bcp Utility
You can also take a look at: About Bulk Import and Bulk Export Operations
Let's say that all 3000 files to be imported have 2000 rows each. That's 6 million rows per day. The bottleneck might not be at the client doing the inserts, but with the database itself. If indexes are enabled on the table(s) in question, inserts could be slow, depending upon how heavily indexed the table(s) is/are. What indications have led you to conclude that it is the database which is waiting around for something to do and that it is the import routine that is lagging behind, rather than the other way around?
You said
Currently, there is a process in place which picks these files up one at a time and takes each line one at a time and sends it to a stored procedure to insert the contents into a database.
(Emphasis added.)
That seems to mean one line equals one transaction.
Fix that.
- Pre-process the files so they're acceptable for bulk loading.
- Pre-process the files so they form valid SQL INSERT statements, and load them that way. (In a single transaction.)
I guess both of those sound like "replace your stored procedure". But the real point is to reduce the number of transactions. Either of those options would reduce the number of transactions for this process from 6 million a day (worst case) to 3000 a day.
精彩评论