开发者

How do you get an SSIS package to only insert new records when copying data between servers

I am copying some user data from one SqlServer to another. Call them Alpha and Beta. The SSIS package runs on Beta and it gets the rows on Alpha that meet a certain condition. The package then adds the rows to Beta's table. Pretty simple and that works great.

The problem is that I only want to add new rows into Beta. Normally I would just do something simple like....

INSERT INTO BetaPeople
 SELECT * From AlphaPeople
 where ID NOT IN (SELECT ID FROM BetaPeople)

But this doesn't work in an SSIS pack开发者_运维百科age. At least I don't know how and that is the point of this question. How would one go about doing this across servers?


Your example seems simple, looks like you are adding only new people, not looking for changed data in existing records. In this case, store the last ID in the DB.

CREATE TABLE dbo.LAST (RW int, LastID Int)
go
INSERT INTO dbo.LAST (RW, LastID) VALUES (1,0)

Now you can use this to insert the last ID of the row transferred.

UPDATE dbo.LAST SET LastID = @myLastID WHERE RW = 1

When selecting OLEDB source, set data access mode to SQL Command and use

DECLARE @Last int
SET @Last = (SELECT LastID FROM dbo.LAST WHERE RW = 1)
SELECT * FROM AlphaPeople WHERE ID > @Last;

Note, I do assume that you are using ID int IDENTITY for your PK.

If you have to monitor for data changes of existing records, then have the "last changed" column in every table, and store time of the last transfer.

A different technique would involve setting-up a linked server on Beta to Alpha and running your example without using SSIS. I would expect this to be way slower and more resource intensive than the SSIS solution.

 INSERT INTO dbo.BetaPeople
 SELECT * FROM [Alpha].[myDB].[dbo].[AlphaPeople]
 WHERE ID NOT IN (SELECT ID FROM dbo.BetaPeople)


How do you get an SSIS package to only insert new records when copying data between servers

  1. Add a lookup between your source and destination.
  2. Right click the lookup box to open Lookup Transformation Editor.
  3. Choose [Redirect rows to no match output].

    How do you get an SSIS package to only insert new records when copying data between servers

  4. Open columns, map your key columns.
  5. Add an entry with the table key in lookup column , lookup operation as

    How do you get an SSIS package to only insert new records when copying data between servers

  6. Connect lookup box to destination, choose [Lookup no Match Output]


Simplest method I have used is as follows:

  • Query Alpha in a Source task in a Dataflow and bring in records to the data flow.
  • Perform any needed Transformations.
  • Before writing to the Destination (Beta) perform a lookup matching the ID column from Alpha to those in Beta. On the first page of the Lookup Transformation editor, make sure you select "Redirect rows to no match output" from the dropdown list "Specify how to handle rows with now matching error"
  • Link the Lookup task to the Destination. This will give you a prompt where you can specify that it is the unmatched rows that you want to insert.


This is the classical Delta detection issue. The best solution is to use Change Data Capture with/without SSIS. If what you are looking for is a once in a life time activity, no need to go for SSIS. Use other means such as linked server and compare with existing records.


The following should solve issue of loading Changed and New records using SSIS:

  1. Extract Data from Source usint Data flow.
  2. Extract Data from Target.
  3. Match on Primary key Add Unmatch records and split matched and unmatched records from Source and Matched records from Target call them Matched_Source, Unmatch_Source and Matched_Target.
  4. Compare Matched_Source and Matched_Target and Split Matched_Source to Changed and Unchanged.
  5. Null load TempChanged Table.
  6. Add Changed Records to TempChanged.
  7. Execute SQL script/stored proc to Delete Records from Target for primary key in TempChanged and add records in TempChanged to Target.
  8. Add Unmatched_Source to Target.


Another solution would be to use a temporary table.

In the properties for Beta's connection manager, change RetainSameConnection to true (by default SSIS runs each query in it's own connection, this would mean the temporary table would be killed as soon as it has been created).

Create a SQL Task using Beta's connection and use the following SQL to create your temporary table:

SELECT TOP 0 *
INTO ##beta_temp
FROM Beta

Next create a data flow that pulls data from Alpha and loads into ##beta_temp (you will need to run the SQL statement above on SSMS first so that Visual Studio can see the table at design time and you will also need to set the DelayValidation property to true on the Data Flow task).

Now you have two tables on the same server and you can just use your example SQL modified to use the temporary table.

INSERT INTO Beta
SELECT * FROM ##beta_temp
WHERE ID NOT IN (SELECT ID FROM Beta)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜