SQL Server Query Fundamentals: A Simple example of a Query that uses PIVOT:

Problem:

Convert the following source data into a schema shown below:

SQL SERVER TSQL PIVOTSolution:

Here’s the code that uses PIVOT function to get to the solution, please use this as a starting point.

Note the use of aggregation function avg – this will depend on the requirement. In the example, the Test_value need to be average if more than one tests were performed.


-- source data
SELECT [Product_ID],[Test_Desc],[Test_Val] FROM [dbo].[Address]
go

-- Destination data using PIVOT function
select * from [dbo].[Address]
pivot( avg(test_val) for test_Desc IN (Test1,Test2,Test3,Test4,Test5)) 
as Tests

SSIS: Using Data Profiling Task to check the candidate key profile of unknown data source(s)

As a part of Business Intelligence projects, we spend a significant amount in extracting, transforming and loading data from source systems. So it’s always helpful to know as much as you can about the data sources like NULLS, keys, statistics among other things. One of the things that I like to do if the data is unknown is to make sure that I get the candidate keys correct to make sure the key used can uniquely identify the rows in the data. It’s really helpful if you do this upfront because it would avoid a lot of duplicate value errors in your projects.

So here’s a quick tutorial on how you can check the candidate key profile using data profiling task in SSIS, You need to perform two main tasks:
1. Generate the xml file using the Data profiling task in SSIS
2. View the content of the xml file using the Data Profile Viewer Tool or using the Open Profile Viewer option in the Data Profiling task editor in SSIS.

Here are the steps:
1a. Open SQL Server Data Tools (Visual Studio/BIDS) and the SSIS project type
1b. Bring in Data Profiling Task on Control Flow
1c. Open the Data Profiler Task editor and configure the destination folder that the tasks uses to create the XML file. You can either create a new connection or use an existing one. If you use an existing connection, make sure that you are setting the OverwriteDestination property to True if you want the file to be overwritten at the destination.

1 SSIS Data Profiling Task Data Cleaning Candidate Key

1d. Click on Quick Profile to configure the data source for the data profiler task

2 SSIS Data Profiling Task Data Cleaning Candidate Key

1e. In the quick profile form, you’ll need to select the connection, table/view and also specify what you to need to computer. For candidate key profile, make sure that the candidate key profile box is checked.

3 SSIS Data Profiling Task Data Cleaning Candidate Key

1f. Run the Task and a XML file should be placed at the destination you specified in step 1C.

Now, It’s time to view what profiler captured.

2a. you can open “Data Profile Viewer” by searching for its name in the start button.

4 SSIS Data Profiling Task Data Cleaning Candidate Key

2b. once it opens up, click on open and browse to the xml file generated by the data profiling task.

5 SSIS Data Profiling Task Data Cleaning Candidate Key

2c. once the file opens up, you can the candidate key profiles.

6 SSIS Data Profiling Task Data Cleaning Candidate Key

2d. Alternatively, You can also open the data profile viewer from the “Data Profiling Task” in SSIS. Go to the Editor > Open Profile Viewer:

7 SSIS Data Profiling Task Data Cleaning Candidate Key

Conclusion:
In this post, you saw how to profile data using the Data Profiling Task in SSIS.

PASS Business Analytics VC: 7 Ideas on Encouraging Advanced Analytics by Mark Tabladillo #sqlpass

Thu, Jul 17, 2014 12:00 PM – 1:00 PM EDT


Abstract:
Many companies are starting or expanding their use of data mining and machine learning. This presentation covers seven practical ideas for encouraging advanced analytics in your organization.

Bio:
Mark Tabladillo is a Microsoft MVP and SAS expert based in Atlanta, GA. His Industrial Engineering doctorate (including applied statistics) is from Georgia Tech. Today, he helps teams become more confident in making actionable business decisions through the use of data mining and analytics. Mark provides training and consulting for companies in the US and around the world. He has spoken at major conferences including Microsoft TechEd, PASS Summit, PASS Business Analytics Conference, Predictive Analytics World, and SAS Global Forum. He tweets @marktabnet and blogs at http://marktab.net.

REGISTER HERE: bit.ly/PASSBAVC071714

hope to see you there!

Paras Doshi
Business Analytics Virtual Chapter’s Co-Leader

TSQL – Quick note about numeric data type to solve “Arithmetic overflow error”

Problem:

You are working on a query where you are trying to convert source data to numeric data type and you get an “Arithmetic overflow error”.

Solution:

Let’s understand this with an example:

Here’s the source data: 132.56000000 and you want to store just 132.56 so write a query that looks like:

cast( as numeric(3,2)) as destination_column_name

and after you run the query its throws an error “Arithmetic Overflow Error” – so what’s wrong?

The issue is that you incorrectly specified the precision and scale – by writing the query that says numeric(3,2) you are saying I want 3 data places with 2 on the right (after decimal point) which leaves just 1 place for left.

what you need to write is numeric(5,2) – and this will have 2 places on the right and leaves 3 places for left.

so after you run this, it shouldn’t complain about the arithmetic overflow error. you just need to make sure that the precision and scale of the numeric data type is correct.

Conclusion:

In this post, you saw an example of how to correctly use the precision and scale in the numeric data type and that should help you solve the arithmetic overflow errors.

SSIS – How to use Execute SQL Task to assign value to a variable?

Problem:

How to use Execute SQL Task in SSIS to assign value to a variable?

Solution:

This is a beginner level post so I’ll show you how you can use Execute SQL Task to assign a value to a variable. Note that variables can also be given full result set. With that said, here are the steps:

1. Create the query against the source system

Example: ((Note the column name, this will be handy later!)

1 Execute SQL Task SSIS Query

2. Open SSIS Project > Create the variable

Example

Variable SSIS Create Steps3. Now, drag a Execute SQL Task to Control Flow. Rename it. And go to Edit. Configure SQL Statement Section

Execute SQL Statement SSIS4. Now, since we want to store a value to the variable, change the Result Set property to Single Row

Single ROW SQL Statement Server SSIS5. One last step, go to result set section and map Result Name (remember the column name from #1?!) with Variable Name:

Result Set SSIS Execute SQL TaskThat’s it! Related article: How to see value of variable during Run Time?

Conclusion:

In this post, you saw how to use Execute SQL Task in SQL server integration services to assign a value to a variable.

SQL server Integration services: How to solve “The value violated the integrity constraints for the column” error?

problem:

you are working on an SSIS package to load a table from a source system and you get an error “The value violated the integrity constraints for the column error” – how do you solve it?

solution:

one the things that the error message should also tell you would be column name. What you want to do is check the table definition of the destination table for any integrity constraints like NOT NULL or PRIMARY KEY. Now once you have that information, go back to your source and figure out if it’s trying to add NULL values in a column that has NOT NULL integrity constraint. Or may be ETL logic is trying to insert duplicate value to the column that has primary key constraint.

Also, the don’t alter the destination table to accept NULL’s or remove integrity constraint. You want to put a logic in your ETL OR fix the data integrity at source. You can use TSQL functions like NULLIF to handle NULL values while querying source systems.

Conclusion:
In this post, we saw how to solve the “The value violated the integrity constraints for the column” error in SSIS.

SSRS: How do I know where a report item is on the report?

Have you ever got a SSRS message saying something like textbox107 refers to an invalid measure or measure used in gaugepanel7 is not valid, etc? The problem is how do you find that “Textbox107” report item on your SSRS report? After all, you might have lots of text boxes on your report.

Here’s how you can know where a report item is on the report:

Step 1: Go to view > Properties Window

Step 2: Open the report.

Step 3: if you select a report item then it’s going to show the properties of the report item in the properties pane.

SSRS Properties Window Report ItemStep 4: And you can select a report item by it’s name like textbox107 or gaugepanel11 for which you are seeing an error, that way you can find the location of the report item and troubleshoot it!

SSRS Properties Window Report Item list

 Conclusion:

I hope this tip helps you find the report item on your SSRS reports.

How to use TSQL checksum to compare data between two tables?

In any BI project, data validation plays an important part. You want to make sure that the data is right! usually business helps in this validation. As a developer, you might also want to do some preliminary data validation. One of the techniques that I’ve learned recently is to use TSQL checksum to compare data between two tables. In this post , I’ll describe the technique & post a pseudo code.

we’ll create a pseudo code to compare all columns but you should be able to use that to tweak that if you need it.

1) Run checksum(*) on Tables:

On Table1:

select checksum(*) as chktb1 from table1
go

On Table 2:

select checksum(*) as chktb2 from table2
go

At this point, you should get two result sets each populated by checksum values computer over all columns since you passed * in the checksum function.

2) Now let’s join these tables & look at rows w/ different checksum: (in other words, it is going to list all rows that are different between table1 & table2)

select * from
(
select checksum(*) as chktb1 from table1
) as tb1
left join
(
select checksum(*) as chktb2 from table2
) as tb2
on tb1.someid=tb2.someid /* you can have more ids */
where tb1.chktb1 != tb2.chktb2

3) You can add individual column now to see what changed:

select * from
(
select checksum(*) as chktb1, columnname1, columnname2 from table1
) as tb1
left join
(
select checksum(*) as chktb2, columnname1, columnname2 from table2
) as tb2
on tb1.someid=tb2.someid
where tb1.chktb1 != tb2.chktb2

Conclusion:
I hope this helps especially if you don’t have rights to install 3rd party tools on your dev machine.

TSQL Script: How to get list of all tables or view in a database?

I was documenting the list of tables/views in a data mart & staging databases & I found the following scripts useful:

TSQL To get list of all tables:

SELECT * FROM <DatabaseName>.information_schema.tables
where TABLE_TYPE='BASE TABLE'

TSQL To get list of all views:

SELECT * FROM <DatabaseName>.information_schema.tables
where TABLE_TYPE='VIEW'

Alternatives (for SQL 2005 onwards):

SELECT * FROM SYS.TABLES

SQL Spatial Data Types: A Tip to fix STIntersects Method returning NULL values.

I was working on some queries that used SQL Spatial data types and ran into a problem where STIntersects function was returning NULL. I was expecting it to say 1 or 0, but it kept on returning NULL. Here’s how I solved it:

1. On Technet article for STIntersects, it said “This method always returns null if the spatial reference IDs (SRIDs) of the geometry instances do not match.

here’s my pseudo code for you:
– – Function
– – Input Parameters @Area Geometry, @Point1 varchar(10), @Point2 varchar(10)
DECLARE @p point;
SET @p = geometry::STGeomFromText(‘POINT(@Point1 @Point2)’, 0);
– -Note that the 0 in the above line is SRID
return(SELECT @p.STIntersects(@area));

2. So next step was to make sure that the two geometry instances that I was using had same SRID’s.

3. Since I was passing the @area (geometry data type) to the function, I had to check the SRID for @area.
so wrote a single line of code that used the STSrid function.

Select @area.STSrid

That told me that the SRID of the geometric data type was 4xxx

4. So now I modified the original code to change the SRID of @p variable to same as that of @area variable.

So here’s the updated code:
DECLARE @p point;
SET @p = geometry::STGeomFromText(‘POINT(@Point1 @Point2)’, 4xxx);
return(SELECT @p.STIntersects(@area));

5. Now since the SRID of @area and @p matched, it started returning 1 or 0 values as expected.