SQL Server Query Fundamentals: A Simple example of a Query that uses PIVOT:

Problem:

Convert the following source data into a schema shown below:

SQL SERVER TSQL PIVOTSolution:

Here’s the code that uses PIVOT function to get to the solution, please use this as a starting point.

Note the use of aggregation function avg – this will depend on the requirement. In the example, the Test_value need to be average if more than one tests were performed.


-- source data
SELECT [Product_ID],[Test_Desc],[Test_Val] FROM [dbo].[Address]
go

-- Destination data using PIVOT function
select * from [dbo].[Address]
pivot( avg(test_val) for test_Desc IN (Test1,Test2,Test3,Test4,Test5)) 
as Tests

SSIS: Using Data Profiling Task to check the candidate key profile of unknown data source(s)

As a part of Business Intelligence projects, we spend a significant amount in extracting, transforming and loading data from source systems. So it’s always helpful to know as much as you can about the data sources like NULLS, keys, statistics among other things. One of the things that I like to do if the data is unknown is to make sure that I get the candidate keys correct to make sure the key used can uniquely identify the rows in the data. It’s really helpful if you do this upfront because it would avoid a lot of duplicate value errors in your projects.

So here’s a quick tutorial on how you can check the candidate key profile using data profiling task in SSIS, You need to perform two main tasks:
1. Generate the xml file using the Data profiling task in SSIS
2. View the content of the xml file using the Data Profile Viewer Tool or using the Open Profile Viewer option in the Data Profiling task editor in SSIS.

Here are the steps:
1a. Open SQL Server Data Tools (Visual Studio/BIDS) and the SSIS project type
1b. Bring in Data Profiling Task on Control Flow
1c. Open the Data Profiler Task editor and configure the destination folder that the tasks uses to create the XML file. You can either create a new connection or use an existing one. If you use an existing connection, make sure that you are setting the OverwriteDestination property to True if you want the file to be overwritten at the destination.

1 SSIS Data Profiling Task Data Cleaning Candidate Key

1d. Click on Quick Profile to configure the data source for the data profiler task

2 SSIS Data Profiling Task Data Cleaning Candidate Key

1e. In the quick profile form, you’ll need to select the connection, table/view and also specify what you to need to computer. For candidate key profile, make sure that the candidate key profile box is checked.

3 SSIS Data Profiling Task Data Cleaning Candidate Key

1f. Run the Task and a XML file should be placed at the destination you specified in step 1C.

Now, It’s time to view what profiler captured.

2a. you can open “Data Profile Viewer” by searching for its name in the start button.

4 SSIS Data Profiling Task Data Cleaning Candidate Key

2b. once it opens up, click on open and browse to the xml file generated by the data profiling task.

5 SSIS Data Profiling Task Data Cleaning Candidate Key

2c. once the file opens up, you can the candidate key profiles.

6 SSIS Data Profiling Task Data Cleaning Candidate Key

2d. Alternatively, You can also open the data profile viewer from the “Data Profiling Task” in SSIS. Go to the Editor > Open Profile Viewer:

7 SSIS Data Profiling Task Data Cleaning Candidate Key

Conclusion:
In this post, you saw how to profile data using the Data Profiling Task in SSIS.

SSIS – How to use Execute SQL Task to assign value to a variable?

Problem:

How to use Execute SQL Task in SSIS to assign value to a variable?

Solution:

This is a beginner level post so I’ll show you how you can use Execute SQL Task to assign a value to a variable. Note that variables can also be given full result set. With that said, here are the steps:

1. Create the query against the source system

Example: ((Note the column name, this will be handy later!)

1 Execute SQL Task SSIS Query

2. Open SSIS Project > Create the variable

Example

Variable SSIS Create Steps3. Now, drag a Execute SQL Task to Control Flow. Rename it. And go to Edit. Configure SQL Statement Section

Execute SQL Statement SSIS4. Now, since we want to store a value to the variable, change the Result Set property to Single Row

Single ROW SQL Statement Server SSIS5. One last step, go to result set section and map Result Name (remember the column name from #1?!) with Variable Name:

Result Set SSIS Execute SQL TaskThat’s it! Related article: How to see value of variable during Run Time?

Conclusion:

In this post, you saw how to use Execute SQL Task in SQL server integration services to assign a value to a variable.

Design pattern for making staging table loads incremental in SSIS:

Summary:

This is a beginner level post targeted at Developers who are new to SSIS and may not have worked on making a SSIS staging load package incremental. In this post, I’ll share a design pattern that I’ve used to make staging loads incremental which pulls in just new or changed rows from source system.

Tutorial:

Before we begin, why would you want to make a staging load incremental when pulling data from source systems? Two main reasons: 1) the source system may not keep historical data but your Business Intelligence system needs to have it 2) it is also faster and puts less strain on source system while doing data pull.

since this is a beginner’s level, I am going to show you a design pattern when you have a column in the source system that can identify New or Changed Rows. If you do not have a column in the source system that identifies new or changed rows then this topic becomes an advanced level and is out of scope for now.

with that said, let’s see the steps involved.

1) I’ve this kill and fill (a.k.a Full Load) package in my SSIS dev environment:

1 Full Load Source Table Destination SSIS2) now, let’s make this incremental. so I’ll go ahead and delete the Execute SQL Task that truncates the data.

3) Now, we need a way to be able to pass in the query in our DFT that gets only the new or changed rows. The source system that I am using has a field called modified date and that’s what I’ll be using to pull in new or changed data.

4) Let’s create the query using the help of variables, execute sql task and script task. (Later, we’ll store in the query in a variable and use that variable in the Data Flow Task)

4a) create ModfiedDate and Query variables

4b) create an Execute SQL Task to run the query to get the max ModifiedDate and write it in the ModifiedDate variable that you created.

Related Post: How to use Execute SQL Task to assign value to a variable?

4c) create a Script Task to get the query using the ModifiedDate variable. This query will extract only new or changed rows from your source system


Dim ModifiedDate As String
Dim sQuery As String
ModifiedDate = Dts.Variables("ModifiedDate").Value.ToString
sQuery = String.Concat("SELECT [SalesOrderID],[SalesOrderDetailID],[CarrierTrackingNumber],[OrderQty],[ProductID],[SpecialOfferID],[UnitPrice],[ModifiedDate] FROM [sales].[SalesOrderDetail] where [ModifiedDate] >= '", String.Concat(ModifiedDate, "'"))
MsgBox(String.Concat("   ", sQuery))
Dts.Variables("Query").Value = sQuery

5) Now, go to variables section and give a default value to user::Query variable because if you do not do this you won’t be able to go to next steps.

6) Go to Data Flow and change the OLEDB source to use the SQL Command from variable and use the user::Query variable

7) Switch to Control flow and Make sure your precedence constraints are set to run Execute SQL Task > Script Task > Data Flow Task

8) Run the package and you should see the dynamic query that gets generated.
Tip: sometimes it’s helpful to run this query that’s generated against the source system for troubleshooting purpose.

SSIS Incremental Load Staging Table

9) On the successful run of the package verify that only new rows got added to the staging table. Also, if there are duplicate rows in the staging table, this might need to handled during the dimension load or fact load. you can also consider having the logic in place here to avoid duplicate records in your staging table.

That’s it!

SSIS Incremental Load Staging Table 2

Conclusion:

In this post, you saw how to make a staging load package incremental.

Similar Blog:

SQL Server Integration services: How to write a package that does Set based updates?

SQL server Integration services: How to solve “The value violated the integrity constraints for the column” error?

problem:

you are working on an SSIS package to load a table from a source system and you get an error “The value violated the integrity constraints for the column error” – how do you solve it?

solution:

one the things that the error message should also tell you would be column name. What you want to do is check the table definition of the destination table for any integrity constraints like NOT NULL or PRIMARY KEY. Now once you have that information, go back to your source and figure out if it’s trying to add NULL values in a column that has NOT NULL integrity constraint. Or may be ETL logic is trying to insert duplicate value to the column that has primary key constraint.

Also, the don’t alter the destination table to accept NULL’s or remove integrity constraint. You want to put a logic in your ETL OR fix the data integrity at source. You can use TSQL functions like NULLIF to handle NULL values while querying source systems.

Conclusion:
In this post, we saw how to solve the “The value violated the integrity constraints for the column” error in SSIS.

SQL Server Integration services: How to write a package that does Set based updates?

Problem:

if you’ve a sizable number of rows that need update in SSIS, then you don’t want to do a row based update commands because it won’t be efficient. if you’ve good number of rows that need to be updated then you can use the SET Based updates. it’s a common design pattern for loading dimensions in a data warehouse.

Find the steps below:

Solution:

There are two main steps to achieve this:

1) Populate the “update” table with rows that have been changed. Note that a new table needs to be created.

2) Run the SQL command to do a SET based update

1. SSIS Set based Updates Integration ServicesLet’s see each step in detail:

1)  Populate the “update” table with rows that have changed.

For this step, first make sure that you have a table that can hold the rows that have been updated.

Then create a Data Flow that take the source data and lookups the data that has changed and puts it in an update/staging table:

2 SSIS Populate the Table with Rows that changed

Note: I’ve used a small table for demo purpose but you won’t use this method if you don’t have a more rows to update because as you can see this method adds an overhead of putting the data in the update table first.

2) Run the SQL command to do a SET based update

Here’s the sample query:

-- run the update command
Update Dim
Set
    [Column1]=Upd.[Column1]
    [Column2]=Upd.[Column2]
--  [Column3]=Upd.[Column3]	
--  ... 
From dbo.DimDestination Dim
Inner Join dbo.Destination_update Upd
on Dim.Destination_sk=Upd.Destination_sk
  
--Truncate update table
Truncate table dbo.Destination_update

Conclusion:

In this post, we saw how to write a package in SSIS that does SET based updates.

SSIS Error on opening a package: The connection “{GUID}” is not found. The error is thrown by connections collection when the specific connection element is not found

In my case, This error came up in SSIS after some copy-pasting happened in our TFS. I tried opening a package & it gave the “The connection {GUID} is not found…” error. So here’s how I was able to solve it:

1. After I got the latest version of files, I navigated to the Integration Services package file on my local machine.

2. Opened the  file in notepad to look at XML.

3. After I was able to see the XML code, I searched for the connection GUID “xyz…” that was showing up in the error

4. Now, once you locate the GUID, figure out the package component that the connection is being used. In my case it was a “Execute SQL Task”

5. I then opened my package and fixed the connection in the Task.

That’s about it for this post. I hope this helps someone out there.

How to strip double quotes while importing data from CSV or TSV using SSMS Import Data wizard OR SSIS?

Long Title! Let me explain. This post will help you solve following problem if you run into it:

1) You are using SSMS Import data wizard to load data from a comma (or tab) separated value (CSV/TSV) file into SQL Server Table & you find that your source data values has double quotes and so you want to strip them before loading to destination table.

2) You are using SSIS to load data from a CSV/TSV file into SQL Server Table & you want to strip the double quotes in source table fields before you load the data to destination table.

Double Quotes CSV file SSMS SSIS LoadSolution:

1. After you’ve configured the Flat File connection. you’ll reach to a point where you’ll see “Flat File Connection Manager” in SSIS. Or in the SQL Server Import & Export data wizard, you’ll see a dialog box to configure flat file connections.

2. In the Text Qualifier, enter

Strip Double Quotes SSMS SSIS Import Wizard3. Make sure to Preview the data to verify that the double quotes around data fields have been trimmed.

4. That’s it! You’ve successfully configured the flat file connection manager to strip double quotes.

Recapping my social media activities during Jan 1 – Feb 24 2013:

Recapping my social media activities during Jan 1 – Feb 20 2013:

That’s about it for this post.

If we want to read related past posts, here they are:

OCT 3 – OCT 10 2012

OCT 11 – OCT 18 2012

OCT 19 – NOV 11 2012

NOV 12 – DEC 31 2012

Let’s connect and converse on any of these people networks!

paras doshi blog on facebookparas doshi twitter paras doshi google plus paras doshi linkedin

How to view error(s) that occur during Master data Service’s Staging Process?

To view error(s) that occur during MDS’s staging process, we have two views: stg.viw_name_MemberErrorDetails & stg.viw_name_RelationshipErrorDetails. For the purpose of this blog post, let me show you how I (as an administrator) view information from stg.viw_name_MemberErrorDetails.

1) Name of the entity: supplier & Name of model is Suppliers

2) I imported data into staging table stg.supplier_leaf via SSIS

3) Now here’s how you can see the errors that occur during MDS’s staging process:

we can go to MDS web application to initiate the staging process > start the batch > after completion you can see the status as well as if it has any errors or not:

Master Data Services Errors4) Now if you see that there’s an error, then you can go to stg.viw_name_MemberErrorDetails to see what are the errors. In my case, I am going to run the query:

select * from stg.viw_supplier_MemberErrorDetails where Batch_ID=2

You can get the above query via MDS web application too:

Master Data Services SQL Server Staging Process

5) And as you can imagine, you can get access to this error data via SSIS (SQL Server Integration services) too. So if you have a workflow that a. Loads data to MDS and b. initiates the Batch process via Stored Procedure; then you can program it to get access to errors from the stg.viw_name_MemberErrorDetails & stg.viw_name_RelationshipErrorDetails tables.

SQL Server Integration Services and Master Data Services

That’s about it for this post! I hope this helps. Your comments are very welcome!