This is a beginner level post targeted at Developers who are new to SSIS and may not have worked on making a SSIS staging load package incremental. In this post, I’ll share a design pattern that I’ve used to make staging loads incremental which pulls in just new or changed rows from source system.
Before we begin, why would you want to make a staging load incremental when pulling data from source systems? Two main reasons: 1) the source system may not keep historical data but your Business Intelligence system needs to have it 2) it is also faster and puts less strain on source system while doing data pull.
since this is a beginner’s level, I am going to show you a design pattern when you have a column in the source system that can identify New or Changed Rows. If you do not have a column in the source system that identifies new or changed rows then this topic becomes an advanced level and is out of scope for now.
with that said, let’s see the steps involved.
1) I’ve this kill and fill (a.k.a Full Load) package in my SSIS dev environment:
2) now, let’s make this incremental. so I’ll go ahead and delete the Execute SQL Task that truncates the data.
3) Now, we need a way to be able to pass in the query in our DFT that gets only the new or changed rows. The source system that I am using has a field called modified date and that’s what I’ll be using to pull in new or changed data.
4) Let’s create the query using the help of variables, execute sql task and script task. (Later, we’ll store in the query in a variable and use that variable in the Data Flow Task)
4a) create ModfiedDate and Query variables
4b) create an Execute SQL Task to run the query to get the max ModifiedDate and write it in the ModifiedDate variable that you created.
Related Post: How to use Execute SQL Task to assign value to a variable?
4c) create a Script Task to get the query using the ModifiedDate variable. This query will extract only new or changed rows from your source system
Dim ModifiedDate As String
Dim sQuery As String
ModifiedDate = Dts.Variables("ModifiedDate").Value.ToString
sQuery = String.Concat("SELECT [SalesOrderID],[SalesOrderDetailID],[CarrierTrackingNumber],[OrderQty],[ProductID],[SpecialOfferID],[UnitPrice],[ModifiedDate] FROM [sales].[SalesOrderDetail] where [ModifiedDate] >= '", String.Concat(ModifiedDate, "'"))
MsgBox(String.Concat(" ", sQuery))
Dts.Variables("Query").Value = sQuery
5) Now, go to variables section and give a default value to user::Query variable because if you do not do this you won’t be able to go to next steps.
6) Go to Data Flow and change the OLEDB source to use the SQL Command from variable and use the user::Query variable
7) Switch to Control flow and Make sure your precedence constraints are set to run Execute SQL Task > Script Task > Data Flow Task
8) Run the package and you should see the dynamic query that gets generated.
Tip: sometimes it’s helpful to run this query that’s generated against the source system for troubleshooting purpose.
9) On the successful run of the package verify that only new rows got added to the staging table. Also, if there are duplicate rows in the staging table, this might need to handled during the dimension load or fact load. you can also consider having the logic in place here to avoid duplicate records in your staging table.
In this post, you saw how to make a staging load package incremental.
SQL Server Integration services: How to write a package that does Set based updates?