开发者

Integration Services vs raw SQL (T-SQL)

When would you use Integration Services and when would just use SQL/stored procedures? What are the advantages of using one or the other?

If you 开发者_运维知识库are migrating data from a legacy system, would you use SSIS or just SQL?


Generally I use BCP for any flat files, linked server calls for server to server transfer and the built in XML support for XML files and then stored procedures for any transformation.

SSIS is useful for complex ETL processes OR if you need to preserve state from step to step. Otherwise it just proves to be an additional burden as it's difficult to debug, buggy and I personally find it more time consuming to develop with it.


MIGRATING DATA FROM LEGACY SYSTEM

  1. If you are migrating data from a legacy system and you have the option of restoring the legacy system database to your new server, then you don't necessarily need SSIS. T-SQL commands on the new server are probably going to be faster to script out and run than a SSIS data flow if you are taking the raw data in whole.

  2. If the legacy data is only accessible via linked server or through SSIS, then you are probably better off using SSIS to create the destination tables and load the data. Stored procedure calls that pull the data across via linked server are generally slower than SSIS data flow connections.

  3. If the legacy server is a non-SQL Server database (like MYSQL or Oracle), then you are probably best off using SSIS and data flows. You could do this via linked servers, but it will probably require more work.

  4. Even if you choose to use SSIS to pull data from another server, I would recommend creating all the tables as T-SQL commands rather than relying on the create table option in the Destination data flow. The create table option doesn't create indexes, clustered indexes, primary keys, etc. and it also defaults every field to NULL. When migrating data from a legacy server, I usually script out the original object, modify the script, and run the modified script on my target server.

  5. I would not recommend using the default wizards in SSIS to import data unless it is a one-time process. These are generally not the best from either a performance stand-point or from a maintenance stand-point. Even when loading the data once, I generally prefer to script out each object individually to make certain I don't repeat the errors of the legacy database in the new database.

  6. If you intend to load the data from the legacy system in a reoccurring manner, then I would recommend using a SSIS package just for maintainability and extensibility. Even if you decide it needs to execute nothing but stored procedures, at least you can parallelize it and organize the stored procedure calls in a meaningfull way. If you go with just executing the stored procedure sequentially in a SQL Server Agent job step, then you aren't going to be able to easily parallelize the process.

OTHER SCENARIOS

  1. I generally call stored procedures as SQL Server Agent job steps when I want to segregate processes into individual steps either for recoverability or performance monitoring purposes. For instance, if I know I need to prepare data before kicking off a SSRS report job, then I usually will make a two-step SQL Server agent job. The first loads data into a table and the second kicks off a shared schedule.

  2. Another scenario is that I have several core data warehouse tables that need to be loaded before I run several other dependent processes. I will usually use a stored procedure to handle complex scrubbing rules that load a table and then kick off several other jobs that use the newly loaded table. All the subsequent steps can either be T-SQL commands to kick off other jobs, T-SQL commands to scrub more data, SSIS packages to move data between servers or output files, etc.

  3. I always use SSIS to import data or export data. I do not use equivalent T-SQL commands to do this even though they are available. I do this because I want the logging, recovery options, ease of development, ease of maintenance, and standardized package workflows.

  4. Since I run a team of BI developers who do anything from writing complex T-SQL statements to developing SSIS packages or SSRS reports, SSIS packages give us an easy way for anyone on my team to understand other team members' work.


Primarily MS Integration Services is used for ETL scenarios, importing of data from various types of data sources, or complex transformations. If your scenario isn't one of those then generally a normal stored procedure will work best.


I use SSIS when the conversion is going to be complex, esp if I need to pull from non-SQL Server sources, such as Excel spreadsheets, or Visual FoxPro, esp if I will need to check to see if the data already exists before inserting a record with a foreign key constraint.

So, if I am importing grades, then I may use SSIS to ensure that the students all exist, and if the student does exist, then just do an update.

Once I know the students exist then I can do their grades.

I used triggers and stored procedures on the destination side, since I inserted/updated a view, so that I could actually send data to several tables, in one query.

So, it depends on the source of the data and how complex it will be, as, I could reproduce what SSIS does, but why take the time since MS did it moderately well.


If you're working with tables only a stored procedure is the option since the SQL engine will actually be executing the job whilst SSIS is executed by DTExec engine. SSIS is a tool designed to work with files, external sources and so on.


If you're moving data from one sql server to another and not doing anything else then just using sql might be better as it's simpler and so doesn't add the extra overhead of a different toolset to learn and manage.

If on the other hand you have complex transformations, you want error handling, repeatable and testable package configurations or if you're pulling in data from other sources such as text files, spreadsheets, etc then SSIS does a good (if somewhat clunky and fiddly) job that'll save time.

So...depends on the situation really...


As a general rule, and with some exceptions: always use SQL if you can do it.

SSIS is nice where you need to access files (csv, excel, txt, ...) or very old databases (may be DBASE or something like that) but in pure database transactions there is few things, if any, that you can not do with pure, clean and easy SQL ;)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜