Saturday, October 11, 2008

Handling Upserts Using 'Slowly Changing Dimension' in SSIS

Many of us would have faced a scenario where we have to bring in records from a source and insert if records don't exist or update if exists (also referred to as Upserts).This can now easily be done using 'Slowly Changing Dimension' (SCD) in SSIS 2005/2008.

In my example, I have a dimension table called dimPerson in Datawarehouse DB - 'DWDB' and a source table called Person in OLTP database - 'OLPTSource'.

My aim is to transfer records from OLPTSource.Person to DWDB.dimPerson based on these rules:

- If PersonID exists in DWDB and if there is any difference/change in columns FName, MName, LName, Address1, Address2, City, State or Zip then update the row with changes.

- If PersonID exists in DWDB and Phone number differs then mark the record as outdated and insert a new record for the PersonID

- If PersonID doesn't exists in DWDB then insert record

For this example you will need to create OLTPSource and DWDB databases and run the following script:

http://docs.google.com/Doc?id=ddstdscf_0c653h9cm

This will do the following –

- Create the two tables Person & dimPerson

- Populate Person & load dimPerson with Person records

- Update & insert statements to Person records to understand how SCD
handles these changes to source after the initial load of dimPerson

Here’s the steps & screenshots for creating the SSIS package -

1. Create a Sequence Container to hold Data Flow task. Drag and drop
a Data Flow Task. Create two OLEDB connections :

OLE DB Connection for datawarehouse database – DWDB
OLE DB Connection for OLTP source database - OLTPSource



2. Go to Data Flow view and configure OLEDB data source to retrieve records from OLTPSource.Person table. Drag and drop ‘Slowly Changing Dimension’ item.




3. Double Click on SCD to bring up Wizard



4. Select Key Type as ‘Business Key’ for PersonID



5. Select Change Type as ‘Historical Attribute’ for PhoneNo and ‘Changing Attribute’ for the rest.



6. Don’t check any of the boxes in the below screen



7. Select the column ‘Active’ to indicate current and expired rows. Enter 1 for current and 0 for expired value. If PhoneNo changes for a personID, then a new record will be inserted with a ‘Active’ column value of ‘1’. Old record will get the ‘Active’ column value updated to ‘0’.



8. Click finish on the next screen and you will notice that the wizard has updated items on the data flow view



9. Excute the package. If you ran the whole script in http://docs.google.com/Doc?id=ddstdscf_0c653h9cm , then you will see a similar result as in screen below. There were two updates to PhoneNo column, that shows up as 2 rows in ‘Historical Attribute Inserts Output’. ‘New Output’ shows up as 1 row because of the one row inserted in the script. Finally, the ‘Change Attribute Updates Output’shows up as 1 because of the one update statement changing Zip and Addess1. Notice that the update statement for PhoneNo and Zip for PersonID =2didn’t change Zip for historical record (active =0). You can change this behaviour by setting UpdateChangingAttributeHistory to True.



Check below msdn link to find out what more on SCD:
http://msdn.microsoft.com/en-us/library/ms141715.aspx