SSIS Lookup with Lookup Component Vs Script Component
I need to load Dimensions from EDW Tables (which does maintain historical records) and is of type Key-Value-Parameter.
My scenario is ok if got a record in EDW as below
Key1 Key2 Code Value EffectiveDate EndDate CurrentFlag
100 555 01 AAA 2010-01-01 11.00.00开发者_如何学Python 9999-12-31 Y
100 555 02 BBB 2010-01-01 11.00.00 9999-12-31 Y
This need to be loaded into DM by pivoting it as
key1 and key2 combinations makes Natural key for DM
SK NK 01 02 EffectiveDate EndDate CurrentFlag
1 100-555 AAA BBB 2010-01-01 11.00.00 9999-12-31 Y
My ssis package does this all good pivoting... looking up the incoming NK in DIM.. if new will insert .. else with further lookup with effective date and determine if the incoming for same natural key got any new (change) in attribute.. if so updates the current record byy setting its end date and insert the new one with new attribute value and pulling the recent records values for other attributes.
My problem is if the same natural key comes twice with same attribute in single extract my first lookup which on natural key .. will let both records pass and try to insert.. where its fails. If i get distinct records on NK the second is not picked and need to run package again.
So my question how can i configure lookup or alernative way to handle this scenario when same NK comes twice in single extract, would be able to insert first record if not exists in Dim table and for second one should be able to updated with the changes with reference to one inserted above.
Not sure this makes sense what am trying to explain. Will attached the screenshot once back to work desk (on monday).
Thanks
Lookup is not good for this - with caching and everything, it's just not able to lookup on it's previously set values.
You might be better off passing it to a SQL Command task and having a stored proc do an insert or expire/insert depending on what it finds.
You could also stream them to a table and do it in a batch.
To address your flow and the model it's trying to populate:
To start with, it's always awkward when the order of rows in the input causes behavioral differences - i.e. NK = A, Val = 1 then NK = A, Val = 2 gives different behavior than NK = A, Val = 2 then NK = A, Val = 1. One has to wonder if this is the correct dimensional design. Remember that all dimensional attributes are assigned to dimensional tables based on a pragmatic choice. Ultimately dimensions can be arranged into tables at will - so a different design might make more sense. If things are changing within a single load, that may indicate you need to break up that load to match the grain (not attempting to load 2 days data at one time).
I notice that there is an Effective Date and an End Date in your dimension. Right now this sounds a lot like a property of the dimension behavior (where your 01 and 02 codes are changing on a NK) and not of the facts that this dimension is going to be attached to. This might indicate that it needs to be tracked in a separate factless fact table, say (SK, EffectiveDate, EndDate) - or that it's just not important, because all you care about is a NK, 01, 02 combination attached to a fact, in which case, your natural key is really all of NK, 01, 02.
I recommend going back to your fact table and the incoming feed and the expected usage and looking at it more closely and see if this maybe need to be a separate factless fact table tracking these dimension changes.
Also, if you could post more details, that might help, and I'll see what Kimball's materials have to say about it when I see more of a business case.
Cade's comments are spot on - but I believe your main issue is the duplicates, period. Does the fact that you have two versions of the same NK in the "source" stream indicate two separate, meaningful versions? Or does only the "last" version matter?
If changes reflected in both versions should be reflected in your dimension table, then I echo Cade's suggestion to divide your processing into batches. You could sort your input by the NK (and time of change), then use a row counting script to enumerate the "versions" of each NK, then process "batches" by version number.
If only the last "version" need be incorporated into the dimension table, I suggest you eliminate duplicates prior to using a Lookup.
精彩评论