Data Quality Service’s Composite Domains in action!

In this post, I’ll show you how composite domains can help you create cross domain rules in Data Quality Services.

Scenario:

You have a data set of employee name, employee category and yearly salary. you want to validate the value in the yearly salary column based on the employee category. Here are the business rules:

Note: for the purpose of the demo, every number is a dollar.

Now, the rule in the Table can be read as:

If employee category is A then yearly salary should be greater than 100000 and less than 200000.

1 composite domains data quality services

Note: I have kept it simple for demo purposes.

Now here is our Data-Set before we set out to validate it:

Employee Name Employee Category Yearly Salary
Jon V Yang A 127000
Eugene L Huang B 90000
Ruben  Torres C 83000
Christy  Zhu D 70000
Elizabeth  Johnson A 90000
Julio  Ruiz C 65000
Janet G Alvarez D 43000
Marco  Mehta B 81000

*Names are taken from Adventure works database. The values in the names and salary column are purely fictional.

Solution:

It’s just an overview, It’s not covered in step by step fashion:

1. Create a KB > created three domains: Employee Category, Employee Name and Yearly Salarly

2. created a composite domain:

2 created a composite domain data quality services

3. Under Composite Domain (CD) Rules Tab:

I started out with defining the rules for category A:

3 create composite domains rules data quality services

And I completed w/ specifying business rules for all four categories

4 create composite domains SQL server 2012

4.  Published KB

5. Created a New DQS project > Selected the KB created above

6.  Selected the data source > Mapped domains

7. I also selected from the list of selected composite domains:

5 view select composite domains data quality project

8. After seeing the cleaning statistics, I switched to the invalid tab to see the records that didn’t match the record:

6 composite domain invalid tab new tab corrected tab correct tab

9. So by now, we have identified records that do not match the rules. A data steward can now correct them if he/she wants to or leave them as it is. Notice the Approve/reject check boxes.

Note that: Not only can you update the yearly salary but you can also update the employee category. So if you think that the employee has been wrongly categorized, you can change that.

10. After this, you can export the data-set which has records that match the business rules and the data-set would be ready to be consumed!

Conclusion:

In this post, we saw how to create cross domain rules using composite domains w/ an example of Employee Category and Yearly Salary.

 

Join #SQLPASS virtual chapters for free online #SQLServer learning!

SQLPASS virtual chapters (VC) provides free sqlserver training year-round:

If you are not signed up already, then consider signing up! With that, And here’s a Quick walk-through on how to Join a VC:

If you do not have a SQLPASS account:

a. Go to http://www.sqlpass.org/RegisterforSQLPASS.aspx

b. Fill up the required information and register

Now, After successful login/registration:

a. Go to http://www.sqlpass.org/MYPASS.aspx

b. switch to MyChapters section

c. Now under virtual chapters, you would see a list of virtual chapters. Join the one’s you are interested in!

my PASS my Chapter Azure VC

How to import data from Excel file to SQL Server:

In this post, we would see how to import data from Excel file to SQL Server using Import and Export Wizard, we would import data from four excel worksheets to SQL server. Here are the steps:

1) Right click the Target Database > Tasks > Import Data to open the SQL Server Import and export wizard

import data ssms sql server import export

2) select the Data Source. In this case, Excel file. You would select the worksheets to fetch data from later, for now – select the excel file as shown below:

import data ssms sql server import export

3) choose the Target Database:

choose the target database import and export systems

4) Select the tables (in our case- worksheets) or specify a query. For the purpose of this demo, we would go with “copy data from one or more table or views” option:

 specify table copy or query

5) Select the worksheets from the Excel file and also specify the tables in the target systems. If the tables in the target database are not already created, then would be created by this wizard:

sql server map source and targetsql server map source and target

6) You have the option of running this wizard immediately or you have the option of creating a SSIS package and then run it later. For the purpose of this post, we would select Run Immediately

7) Review the information and click Finish

sql server review the complete wizard

8) After successful execution:

8 sql server import export execution was successful

Conclusion:

In this blog post, we saw how to import data from Excel file to SQL server.

 

Master Data Services: Analogy to remember what are Models, Entities and Attributes

Let’s understand master data services terminologies using an analogy. We’ll use the world of RDBMS for our analogy:

Terminology in MDS Analogy w/ RDBMS
 Models  SQL server Schema which is used to group logical entities together.
 Entities  Table
Attributes Columns

And here’s a picture that captures the thought: Models contains entities; entities contains Attributes:

master data services models entities attributes

And Here’s the official reference: http://msdn.microsoft.com/en-us/library/ee633763.aspx

Conclusion:

Thinking about the master data services terminologies by drawing an analogy helps me remember them – I hope it helps someone out there too.

Guest Blog: How we use Fuzzy Lookup add-in in our company to solve data inconsistency problems:

This is a Guest Blog from Mantresh Jain.

About Mantresh Jain:

Mantresh Jain is a C Level Executive at SMB in manufacturing domain based out of India. He has bachelor’s degree from a business school. And he holds a special interest in how businesses can leverage newest Information Technology Tools for optimizing business processes. He is working on a company-wide ERP implementation and is a single point of contact for the implementation process. He spends his free time on computer games of all kinds! Link with him here: http://www.linkedin.com/pub/mantresh-jain/43/562/749

 How did they discover Fuzzy Look-up add-in for Excel (A write-up by Paras)?

Some months ago, Mantresh approached me to see if I knew any tool that would help him deal with “messy” data. On Further questions, I learned that

-          Messy data = lots of duplicates

-          Uses SQL Server Express & do NOT have plans to upgrade to SQL Server versions that include Data Quality Services and/or Master Data Services. Remember the context here: They are a small and medium size business.

-          Do use Excel – a lot!

-          Do not have folks w/ “SQL” knowledge

With this requirements, I asked him to see if an add-in for excel called “Fuzzy Lookup” meets their need. After trying it out: here’s Mantresh’s experience of using Fuzzy Lookup add-in for Excel in their organization:

Summary:

In my company we are implementing ERP software. I faced a problem of Data migration from two fox Pro based software’s to SQL (for ERP)

More Details:

Two fox Pro Software’s worked independently form each other. And as a result each of them had their Separate Databases.

Lets Call them FX1 and FX2.

Now I wanted to import Account Master Data from them to SQL, Here are the fields in our Account Master data:

Name, Address, Bank Details, Phone Number among other fields

Problem

Both systems had issues of data Duplication and Data Inconsistency

To give you an example, I faced following problems:

1) FX1 had around 3500 entries and FX2 had 2400 entries

Now in FX1 out of 3500 around 2000 were same as FX2

Also FX2 had around 2000 entries same as FX1

Now i wanted to import only unique Account Master gathered by “combining” the two systems to SQL.

Example:

FX1 has “VMS Industires” while FX2 has “V.M.S Industries”

Solution

Fuzzy Look up add-in for Excel.

Step 1) Import data from both databases to excel

Step 2) Using Fuzzy Look up to find data matching to each other based on variable conditions that we select.

Step 3) It reorganizes data as

FX1 entry 1st matching FX2 Entry
2nd Matching FX2 entry

This is how we  find Duplicate entries and then clean our data-set

Benefit

If not for Fuzzy Look I would have had to manually match each entry to each other which would have taken estimated 60 to 100 Man Hours but with Fuzzy Look-up, we did the job in 24 Man Hours Only.

———————-

Conclusion by Paras:

Thanks Mantresh for sharing your experience!

And here’s a related post:
How to clean similar textual data in Excel via Fuzzy lookup add-in?

Back to basics: What’s CRUD?

Some year’s ago – I got introduced to SQL. At that time, I recall, I was sitting in a lab and one of the first exercises we did was to create a table in a database and adding data in it. In next lab, we ran SQL commands that updated records and deleted few. After we’re done – our instructor told us what we learned were the most basic programming functions i.e CRUD operations

CRUD stands for Create, Read, Update and Delete.

C: Create

R: Read

U: update

D: delete

Let’s see the SQL equivalent of CRUD operations:

Operation SQL
Create INSERT
Read SELECT
Update UPDATE
Delete DELETE

Is the concept of CRUD just applicable to SQL?

No. in fact, if you start learning programming or web development – one of the first things that you get to learn is how to run CRUD operations with that particular language.

Conclusion:

In this blog post, I documented four (4) basic programming functions i.e. Create, Read, Update and Delete.

what are the end – to – end processes for developers to build out a Master Data Services 2012 solution?

I recently completed an end to end tutorial that taught me basics of Master Data Services 2012. And I thought I would take what I’ve learned and create a diagram for reference that helps developers see end – to – end processes that are involved in building out a SQL Server Master Data services 2012 solution. So here it is:

SQL Server 2012 Master data services steps

Resource: Learn to build an Enterprise Information Management system using Data Quality Services, Master Data Services and SQL Server Integration Services

[UPDATE 29 Dec 2012: One more whitepaper that's on the same subject and should be read after the tutorial: Cleanse and Match Master Data by Using EIM ]

Here’s the resource: Tutorial: Enterprise Information Management using SSIS, MDS, and DQS Together

Here’s what you’ll learn:

- Learn to use Data Quality client to create DQS Knowledge Base, cleanse data in an excel data, remove duplicated data from the excel file

DQS client domain values clean data data quality services

- Learn Use MDS add-in for Excel, store the cleansed and matched data in MDS

Master data services add in for excel

- Learn how to use the MDS Web UI:

SQL Server 2012 master data services MDS Web UI

- Learn to Automate the process of receiving input data, cleaning and matching it and storing the master data into MDS via SSIS!

SSIS automate DQS MDS data quality component

That’s about it! check out the Tutorial to learn about all the DQS, MDS & SSIS goodness!

 

How to Solve: SQL Server 2012 Data Quality Services Does not allow you to export the cleansing or matching results to an Excel file if you are using 64-bit version of Excel

While I was working with DQS, I got an error that didn’t allow me to export the cleansing results to an excel file. I searched to find that it was a known issue for the 64 bit version of Excel. So based on what I found, this is how I was able to solve the error:

1. Downloaded and Installed the SQL Server 2012 SP1

Downloaded and Installed the SQL Server 2012 SP1

2. Ran the DQSInstaller.exe -upgrade command in the command prompt

Ran the DQSInsaller.exe -upgrade command in the command prompt

That’s about it. And I was able to export the cleansing results to an 64 bit version of the excel file.

Screenshot 1:

export the cleansing results to an 64 bit version of the excel file.

Screenshot 2:

SQL Server Data Quality Services: How to test Domain Rules in the DQS client.

In the domain management activity of Data Quality Services, you can test the domain rules on sample data while creating the Knowledge base . Here are more details:

1. Create a KB > Select “Domain Management as activity” > create a Domain > Go to the Domain Rules Section of a Domain.

2. For the purpose of this blog-post, I am going to create a domain rule named “Email Validation” which requires the Values to End With @ParasDoshi.com

3. Here’s the Domain Rule. Now click on “Run the selected Domain Rule on Test data” icon:

domain management activity of Data Quality Services, you can test the domain rules on sample data while creating the Knowledge base

4. Now add some test data and then click on “Test the domain rule on all items”.

data quality services sql server test the domain rule

Thus, This way you can build and test the Domain Rule while creating the Knowledge Base.

Conclusion:

In this blog-post, we saw how to test the Domain Rule in the Data Quality Services client of SQL Server 2012