开发者

Excel Import how would you do it?

Ok i have a Excel import written. It uses excel automation to go through all the records and get the job done. BUt how would you do it if you had to do it?

Would you use SSIS? Would You use a Dataconnection? I am really confused as to the best way to get 开发者_高级运维this done properly. So that it doesn't slow down the actual application for the other clients when one client does an import. Thanks


An approach I have used in the past is to read Excel spreadsheet data using OleDb. This article has the basics:

http://www.davidhayden.com/blog/dave/archive/2006/05/26/2973.aspx

This avoids having to use automation, and proved to be quite efficient.


It depends on a lot of things. I've worked on two fairly large projects that imported data from Excel in two very different fashions and they had entirely different approaches that suited them.

SSIS Excel Import

This project involved slurping hundreds of files on a daily basis with a very well defined data format into a staging database that would then perform some ETL on that data to take it into a Data Warehouse (and then later a Data Mart and cubes, etc.)

Why we chose SSIS for this:

  • the rest of the ETL made sense to do in SSIS and it reduced complexity by being consistent
  • defining the schema for the Excel import was quite easy in SSIS and since it was very rigid (gov't regulated) we didn't have to worry about altering the resulting packages with a schema change
  • deployed nicely to the Integration Services server and could be run via SQL Agent jobs

Custom Financial Application using VSTO to import

This project took an existing spreadsheet that was already being used to calculate a bunch of financial metrics, including a macro which would run a number of combinations of scenarios based on various spreadsheet-defined assumptions, and inserted it all into a SQL Server database (where it went on a similar DW -> DM -> cube journey).

Why we chose VSTO for this:

  • we had to use their spreadsheet's model, and they wanted the ability to modify those calculations at will without having to do code changes and without having to learn how to program
  • that's basically it

Both approaches worked equally well but had various reasons for choosing them. The biggest one being flexibility of the format/structure of the data being pulled from Excel. If it's very well-defined and rigid, SSIS is an easy and low-user-interference way to do it. However, if they need to make a lot of changes, then an Office Interop platform (VSTO) might be the way to go.


That really depends on the requirements - is it for a single import of legacy data into a new system, will it be used on a regular basis, what is the data volume, is there a reusable data access and business logic layer for the target database, are the excel files already in Open XML format so that one can use the Open XML SDK, ...

But for a large range of situations I would just use an OLE DB connection to read from the Excel document and an existing data access and business logic layer for writing to the target database.


You can read Excel directly into your program using ADO.NET. Check out the following code -

string fileName = "c:\myFileName.xls"; 
string connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0; 
    data source={0}; Extended Properties=Excel 8.0;", fileName); 

// Create the data adapter pointing to the spreadsheet
var oa = new OleDbDataAdapter("SELECT * FROM [myWorkSheetName$]",
    connectionString); 

// Create a blank data set
var ds = new DataSet();  

// Fill the data set using the adapter
oa.Fill(ds, "anything");  

// Create a data table from the data set
DataTable dt = ds.Tables["anything"]; 

You can then manipulate the data in the data table however you wish.


I would read the Excel spread sheet to a data table with a 3rd party component. I would not recommend using OLEDB and a Excel provider, since I have had a lot of Unicode issues using those. Then I would use the System.Data.SqlClient.SqlBulkCopy to map the data and bulk/insert it to the database.

If you know that the table your importing to is only accessed/locked by your application, you can configure SqlBulkCopy to do a table lock, which will improve performance notibly; in my case, up to 12-15x!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜