SSIS: Using Data Profiling Task to check the candidate key profile of unknown data source(s)

As a part of Business Intelligence projects, we spend a significant amount in extracting, transforming and loading data from source systems. So it’s always helpful to know as much as you can about the data sources like NULLS, keys, statistics among other things. One of the things that I like to do if the data is unknown is to make sure that I get the candidate keys correct to make sure the key used can uniquely identify the rows in the data. It’s really helpful if you do this upfront because it would avoid a lot of duplicate value errors in your projects.

So here’s a quick tutorial on how you can check the candidate key profile using data profiling task in SSIS, You need to perform two main tasks:
1. Generate the xml file using the Data profiling task in SSIS
2. View the content of the xml file using the Data Profile Viewer Tool or using the Open Profile Viewer option in the Data Profiling task editor in SSIS.

Here are the steps:
1a. Open SQL Server Data Tools (Visual Studio/BIDS) and the SSIS project type
1b. Bring in Data Profiling Task on Control Flow
1c. Open the Data Profiler Task editor and configure the destination folder that the tasks uses to create the XML file. You can either create a new connection or use an existing one. If you use an existing connection, make sure that you are setting the OverwriteDestination property to True if you want the file to be overwritten at the destination.

1 SSIS Data Profiling Task Data Cleaning Candidate Key

1d. Click on Quick Profile to configure the data source for the data profiler task

2 SSIS Data Profiling Task Data Cleaning Candidate Key

1e. In the quick profile form, you’ll need to select the connection, table/view and also specify what you to need to computer. For candidate key profile, make sure that the candidate key profile box is checked.

3 SSIS Data Profiling Task Data Cleaning Candidate Key

1f. Run the Task and a XML file should be placed at the destination you specified in step 1C.

Now, It’s time to view what profiler captured.

2a. you can open “Data Profile Viewer” by searching for its name in the start button.

4 SSIS Data Profiling Task Data Cleaning Candidate Key

2b. once it opens up, click on open and browse to the xml file generated by the data profiling task.

5 SSIS Data Profiling Task Data Cleaning Candidate Key

2c. once the file opens up, you can the candidate key profiles.

6 SSIS Data Profiling Task Data Cleaning Candidate Key

2d. Alternatively, You can also open the data profile viewer from the “Data Profiling Task” in SSIS. Go to the Editor > Open Profile Viewer:

7 SSIS Data Profiling Task Data Cleaning Candidate Key

In this post, you saw how to profile data using the Data Profiling Task in SSIS.

PASS Business Analytics VC: 7 Ideas on Encouraging Advanced Analytics by Mark Tabladillo #sqlpass

Thu, Jul 17, 2014 12:00 PM – 1:00 PM EDT

Many companies are starting or expanding their use of data mining and machine learning. This presentation covers seven practical ideas for encouraging advanced analytics in your organization.

Mark Tabladillo is a Microsoft MVP and SAS expert based in Atlanta, GA. His Industrial Engineering doctorate (including applied statistics) is from Georgia Tech. Today, he helps teams become more confident in making actionable business decisions through the use of data mining and analytics. Mark provides training and consulting for companies in the US and around the world. He has spoken at major conferences including Microsoft TechEd, PASS Summit, PASS Business Analytics Conference, Predictive Analytics World, and SAS Global Forum. He tweets @marktabnet and blogs at


hope to see you there!

Paras Doshi
Business Analytics Virtual Chapter’s Co-Leader

TSQL – Quick note about numeric data type to solve “Arithmetic overflow error”


You are working on a query where you are trying to convert source data to numeric data type and you get an “Arithmetic overflow error”.


Let’s understand this with an example:

Here’s the source data: 132.56000000 and you want to store just 132.56 so write a query that looks like:

cast( as numeric(3,2)) as destination_column_name

and after you run the query its throws an error “Arithmetic Overflow Error” – so what’s wrong?

The issue is that you incorrectly specified the precision and scale – by writing the query that says numeric(3,2) you are saying I want 3 data places with 2 on the right (after decimal point) which leaves just 1 place for left.

what you need to write is numeric(5,2) – and this will have 2 places on the right and leaves 3 places for left.

so after you run this, it shouldn’t complain about the arithmetic overflow error. you just need to make sure that the precision and scale of the numeric data type is correct.


In this post, you saw an example of how to correctly use the precision and scale in the numeric data type and that should help you solve the arithmetic overflow errors.

SSIS – How to use Execute SQL Task to assign value to a variable?


How to use Execute SQL Task in SSIS to assign value to a variable?


This is a beginner level post so I’ll show you how you can use Execute SQL Task to assign a value to a variable. Note that variables can also be given full result set. With that said, here are the steps:

1. Create the query against the source system

Example: ((Note the column name, this will be handy later!)

1 Execute SQL Task SSIS Query

2. Open SSIS Project > Create the variable


Variable SSIS Create Steps3. Now, drag a Execute SQL Task to Control Flow. Rename it. And go to Edit. Configure SQL Statement Section

Execute SQL Statement SSIS4. Now, since we want to store a value to the variable, change the Result Set property to Single Row

Single ROW SQL Statement Server SSIS5. One last step, go to result set section and map Result Name (remember the column name from #1?!) with Variable Name:

Result Set SSIS Execute SQL TaskThat’s it! Related article: How to see value of variable during Run Time?


In this post, you saw how to use Execute SQL Task in SQL server integration services to assign a value to a variable.

SQL Server Analysis services – How to set the order by attribute sort key?


How to sort the dimension attribute by something other than the key and name column? How do you set the “OrderBy” property?

Example: You have created an Inventory age buckets 1-50,51-100,101-150 and so if a business user uses this dimension attribute then the sorting won’t be logical. It would be 1-50, 101-150,51-100 – so how to show the buckets in the logical order?


1. make sure that the table/view that you are bringing in has the sort key.


1 SSAS Attribute order by sort key2. Now, switch to SSAS and open your dimension. I am assuming that you’ve already configured your data source views and you are already bringing in these columns in the dimension:

Dim Inventory SSAS SSIS VIEW Data source VIEW

3. Let’s start with hiding Aging Bucket Sort key so that it’s not visible to user. Change the AttributeHierarchyVisible to False

4. Now, switch to Attribute Relationships – Right Click on Aging Bucket and click on New Attribute Relationship. And set the attribute relanship between Aging bucket and Aging Bucket Sort Key

Attribute Relationships SSAS

And you should see something like this in your attribute relationship section:

SSAS Attribute Relationship Sort Key

5. Now, one more thing to configure. Go back to dimension structure section. Open the properties section for the Aging Bucket Attribute and change the OrderBy property to AttributeKey. Also, change the orderByAttribute property to Aging Bucket Sort Key (in your case, choose the sort key that you have)

SSAS Order Sort by attribute property

That’s it, after you process the model then you should see the attribute being sorted based on the sort key that you had.


In this post, you saw how to configure sort/order property of a dimension attribute.

Design pattern for making staging table loads incremental in SSIS:


This is a beginner level post targeted at Developers who are new to SSIS and may not have worked on making a SSIS staging load package incremental. In this post, I’ll share a design pattern that I’ve used to make staging loads incremental which pulls in just new or changed rows from source system.


Before we begin, why would you want to make a staging load incremental when pulling data from source systems? Two main reasons: 1) the source system may not keep historical data but your Business Intelligence system needs to have it 2) it is also faster and puts less strain on source system while doing data pull.

since this is a beginner’s level, I am going to show you a design pattern when you have a column in the source system that can identify New or Changed Rows. If you do not have a column in the source system that identifies new or changed rows then this topic becomes an advanced level and is out of scope for now.

with that said, let’s see the steps involved.

1) I’ve this kill and fill (a.k.a Full Load) package in my SSIS dev environment:

1 Full Load Source Table Destination SSIS2) now, let’s make this incremental. so I’ll go ahead and delete the Execute SQL Task that truncates the data.

3) Now, we need a way to be able to pass in the query in our DFT that gets only the new or changed rows. The source system that I am using has a field called modified date and that’s what I’ll be using to pull in new or changed data.

4) Let’s create the query using the help of variables, execute sql task and script task. (Later, we’ll store in the query in a variable and use that variable in the Data Flow Task)

4a) create ModfiedDate and Query variables

4b) create an Execute SQL Task to run the query to get the max ModifiedDate and write it in the ModifiedDate variable that you created.

Related Post: How to use Execute SQL Task to assign value to a variable?

4c) create a Script Task to get the query using the ModifiedDate variable. This query will extract only new or changed rows from your source system

Dim ModifiedDate As String
Dim sQuery As String
ModifiedDate = Dts.Variables("ModifiedDate").Value.ToString
sQuery = String.Concat("SELECT [SalesOrderID],[SalesOrderDetailID],[CarrierTrackingNumber],[OrderQty],[ProductID],[SpecialOfferID],[UnitPrice],[ModifiedDate] FROM [sales].[SalesOrderDetail] where [ModifiedDate] >= '", String.Concat(ModifiedDate, "'"))
MsgBox(String.Concat("   ", sQuery))
Dts.Variables("Query").Value = sQuery

5) Now, go to variables section and give a default value to user::Query variable because if you do not do this you won’t be able to go to next steps.

6) Go to Data Flow and change the OLEDB source to use the SQL Command from variable and use the user::Query variable

7) Switch to Control flow and Make sure your precedence constraints are set to run Execute SQL Task > Script Task > Data Flow Task

8) Run the package and you should see the dynamic query that gets generated.
Tip: sometimes it’s helpful to run this query that’s generated against the source system for troubleshooting purpose.

SSIS Incremental Load Staging Table

9) On the successful run of the package verify that only new rows got added to the staging table. Also, if there are duplicate rows in the staging table, this might need to handled during the dimension load or fact load. you can also consider having the logic in place here to avoid duplicate records in your staging table.

That’s it!

SSIS Incremental Load Staging Table 2


In this post, you saw how to make a staging load package incremental.

Similar Blog:

SQL Server Integration services: How to write a package that does Set based updates?

SQL Server Analysis Services (SSAS) multidimensional cube: How to create a role playing dimension?


I was recently asked by a SSAS developer about how to set up a role-playing dimension so I did a brief demo to show how to set it up. In this post, I am document the steps to help you get started:

What is role-playing dimension?

What if you want to see sales data by Order Date as well as Shipped Date? You can one date dimension that is used more than once for seeing sales data by order date and shipped date. In this case, date dimension becomes a Role Playing Dimension.


1. Open SQL Server Data Tools and SSAS project that you are working on.

2. Go to Data Source Views from the solution explorer.

3. Make sure that the relationships are specified between dimension (e.g. date dimension) and Fact Table (e.g. Fact Sales). Example: If you’ve OrderDateSK and ShippedDateSK in your fact table, then establish a relationship between the both columns with the Primary Key of the Date dimension.

SSAS Data Source View Role playing dimension

Note: If you do not have the relationships specified in the data source views, there’s an alternative to go to dimension usage for a cube and setup role playing dimensions. since this is targeted to get you started, I am not covering the steps for that.

4. Now go to the Dimensions section in the solution explorer and add a Dimension that uses the existing data source view from step #3.

5. switch to Cubes.

Now, if you don’t have a cube created yet then you can use the cube wizard if you want and as long as you have the relationships specified at DSV’s, it should create a role-playing dimension.

Alternatively, you can also create a new dimension. In the Design pane, lower left section, you’ll see dimensions.

Add Dimensions Cube SSAS

select your dimension that you created in step #4 that’s a role-playing dimension from the list and click on OK. And it creates a role-playing dimension for you! So all you had to do was to make sure the relationships are created between the fact and (role-playing) dimension table and SSAS does the rest of the work for you

SSAS Role Playing Dimension


Note: for the demo I didin’t rename the dimension attributes but you do all that good stuff too when you are building the cube for users!


In this post you got started on how to add a role-playing dimension in a SSAS cube.

SSAS Multidimensional cube: How to solve duplicate attribute key error when attribute is non key?


In SSAS multidimensional cube, while processing a cube, you get an error/warning for a non-key “price” column that says:

Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: ‘Table Name’, Column: ‘Price’, Value: ’29′. The attribute is ‘Price’.

You see, if it would complain about a key column having duplicate attribute that would have been fine but it’s complaining about duplicate attribute key when an attribute is non key. How do you solve this?


mostly this is because that the view/table that you are using in your data source views have blanks/nulls.

so in your source, use ISNULL/COALESCE TSQL function before bringing data in the cube OR if it makes sense for you then you can filter out blank data too.

I hope that helps.

SQL Server Reporting Services: How to Solve Divide by Zero Errors?


you thought you wrote an if condition to handle “divide by zero” errors in SSRS but still get the run time error?

Example. my expression is:


And I still get the error:
Divide by Zero Error SSRS

Note: if you do not have integers to divide and you allow data conversion to double, it would show on SSRS as “infinity” instead of “#Error”.


Let’s understand why does it do it?

So even though you are using IIF condition, the vb IIF condition evaluates every function in every expression before it renders and that’s why the “False” condition that I have in my condition gets run which results in #Error.

Ok, armed with that knowledge, let’s solve the problem.

So here’s a modified version of the expression, have a look:

See what we did there! We added one more IIF condition in the “false” condition of the parent IIF.


That should solve the problem:
Divide by zero fix customer code ssrs

There’s also an alternative to this especially if you have a lot of expressions that does this. You can write your custom code and call it SSRSDIVIDE or you can come up with a better name! Here’s a post that talks about how to do that:

In this post, we saw how to solve the divide by zero errors in SSRS.

SSAS Tabular: How to mark a table as Date Table?


For time intelligence in SSAS Tabular models, it’s important to mark a date table. So, how do you do that?


Here are the steps:

1. After you have the data imported in a table, in the model designer, select the date table

2. Also, make sure that necessary relationships are created between date table and fact tables.

3. Now, on the menu bar, select Table > Date > Mark as Date Table

SSAS Tabular mark as date table4. A dialog box should ask you for a unique identifier of the date table, so it could be a date column which has the unique columns for the date table that you imported.

Also, make sure that the unique identifier does not have blank values because the model designer will not allow you to select it as your unique identifier and so you may have to go back to your table properties and put a logic to filter out blank values or make sure that the a row with blank value has a value assigned to it.

After you do that, you’re all set to go! I hope this helps.