Examples to help you differentiate between Business Intelligence and Data Science problems:

In this post, I’ll list few examples from various industries to help you differentiate between business intelligence and data science problems.

Sometime back, I blogged about “Business Analytics Continuum” and in the post we saw that Every Organization has DATA but they use their business data at different levels because of their maturity level. Excel (or other transactional reporting tools) is usually the starting point for any organization – it helps them see WHAT happened. They advance to the next stage, where they get capabilities to slice and dice their data – To find out WHY – and usually this capability is delivered using Business Intelligence tools & techniques. Once the data culture spreads – Thanks to a successful Business Intelligence project – then they soon start to outgrow their business intelligence capabilities by asking problems that need predictive capabilities. This is advanced analytics and Data Science stage. To that end, here are 5 examples to help you differentiate between business intelligence and data science problems:

Business Intelligence.(WHAT & WHY) Data Science & advanced analytics.
Bike Rentals
  1. How many bikes did we rent in Q3 2014? How does that compare to Q3 2013?
  2. What is the trend of total bike rentals at week level? Can you break it down by geography?
Can you predict bike rentals on an hourly basis?
Credit Risk
  1. How many customers have a credit risk of ‘C’?
  2. Can you rank customers by their payments due amount that have a credit risk ‘C’?
Can you predict the credit risk of the customer during contract negotiations stage?
Customer relationship management
  1. How many account cancellations occurred this year (broken down by month and customer segmentation)?
  2. How does percentage of account cancellations this year compare to that previous year?
 Can you predict customer churn?
Flight Delays
  1. What is the trend of % of flight delayed this year?
  2. Can you break down flight delays this year by their reasons?
Can you predict whether a scheduled flight will be delayed by more than 15 minutes?
Customer feedback
  1. What is the customer satisfaction % trend this year?
  2. What is the customer satisfaction % broken down by customer segments and product segments?
Can you classify a customer feedback comment into “positive”, “negative” or “neutral”?

I hope this helps!

PASS Business Analytics VC: Insider’s Introduction to Microsoft Azure Machine Learning (#AzureML). #sqlpass

RSVP: http://bit.ly/PASSBAVC091814


Session Abstract:
Microsoft has introduced a new technology for developing analytics applications in the cloud. The presenter has an insider’s perspective, having actively provided feedback to the Microsoft team which has been developing this technology over the past 2 years. This session will 1) provide an introduction to the Azure technology including licensing, 2) provide demos of using R version 3 with AzureML, and 3) provide best practices for developing applications with Azure Machine Learning.
Speaker BIO:
Mark is a consultant who provides enterprise data science analytics advice and solutions. He uses Microsoft Azure Machine Learning, Microsoft SQL Server Data Mining, SAS, SPSS, R, and Hadoop (among other tools). He works with Microsoft Business Intelligence (SSAS, SSIS, SSRS, SharePoint, Power BI, .NET). He is a SQL Server MVP and has a research doctorate (PhD) from Georgia Tech.

RSVP: http://bit.ly/PASSBAVC091814

Hope to see you there!

Paras Doshi
Business Analytics Virtual Chapter’s Co-Leader

 

Back to basics: Multi Class Classification vs Two class classification.

Classification algorithms are commonly used to build predictive models. Here’s what they do (simplified!):

Machine Learning Predictive Algorithms analytics Introduction

Now, here’s the difference between Multi Class and Two Class:

if your Test Data needs to be classified into two classes then you use a two-class classification model.

Examples:

1. Is it going to Rain today? YES or NO

2. Will the buyer renew his soon-to-expire subscription? YES or NO

3. What is the sentiment of this text? Positive OR Negative

As you can see from above examples the test data needs to be classified in two classes.

Now, look at example #3 – What is the sentiment of the text? What if you also want an additional class called “neutral” – so now there are three classes and we’ll need to use a multi-class classification model. So, If your test data needs to be classified into more than two classes then you use a multi-class classification model.

Examples:

1. Sentiment analysis of customer reviews? Positive, Negative, Neutral

2. What is the weather prediction for today? Sunny, Cloudy, Rainy, Snow

I hope the examples helped, so next time you have to choose between multi class and two class classification models, ask yourself – does the problem ask you to predict two classes or more? based on that, you’ll need to pick your model.

Example: Azure Machine Learning (AzureML) studio’s classifier list:

Azure Machine Learning classifiers list

I hope this helps!

What is the purpose of creating Tables & Graphs?

Knowing why we do what we do is important. Stephen Few lists four reason for creating Tables & Graphs in his book “Show me the number”. I really liked them so I am posting it here for your reference:

  1. it helps us communicate. It helps present information to others.
  2. it helps us analyze data. it helps us find the insights in the data.
  3. It helps us Monitor Performance. It helps us keep track information about performance e.g. Sales Performance, Speed of Manufacturing, etc.
  4. It helps us Plan. It helps us predict and prepare for the future.

Exploring, filtering and shaping web-based public data using Data Explorer Excel add-in:

Data Explorer let’s you “Explore” (search) for web-based public data. This is a great way to combine data that you may have in your data-sources with public data sources for data analysis purposes. Sometimes your data might not tell you the reason behind the observed trends, so when that happens – you can try to see if a public data-set might give you the much-needed context. Let me give you an Example before we start hands-on w/ data explorer so that you have better understanding of importance of public datasets. Here’s a sample that I found here. So, Here’s a demo:

An auto company is seeing sales trends of Hybrid cars and SUV’s from the sales data-sources. But what is the reason behind that? company data does not show that. Someone hypothesizes that it might be because of gas prices. So they test out the hypothesis by combining gas prices information available via public data. And turns out gas prices might be the driving force of sales trends! SEE:

if the gas prices increase, then the sale of SUV go down and the sale of Hybrids go up:

data analysis combine data with public datasets

You know that public data can be helpful! So how can you search for public data-sets? Well, You can manually search online, ask someone, browse through public data repositories like azure data market (and other data markets), there’s also a public data search engine! OR you can directly search for them via Data Explorer.

Here are the steps:

1) Excel 2010/2013 > Data Explorer Tab > Online Search > type “Tallest Buildings”

excel public data search data explorer2) I selected one of the data-sets that said “Tallest completed building…. “

excel data from internet

3) Now let’s do some filtering and shaping. Here are the requirements:

- Hide columns: Image, notes & key

- clean columns that has heights data

- Show only city name in location

OK, let’s get to this one by one!

4) Hiding Columns:

Click on Filter & Shape button from the Query Settings:

excel data shaping cleaning

Select Image Column > Right Click > Hide:

excel hide remove columns

Repeat the steps for notes & key column.

Click on DONE

5) clean column that has heights data.

Click on Filter & Shape to open the query editor

A) let’s rename it. Select column > Right Click > Rename to Height > press ENTER

B) let’s remove the values in brackets. Select Column > right click > split column > By delimiter > At each occurrence of the delimiter > Custom and enter “(” > OK

excel split a columnThis should transform the data like this:

excel data explorer split a column

Hide height.2 and rename the height.1 to height

Click on DONE

6) Let’s just have city names in the location column

click on Filter & shape to load query editor:

A) select location > right click > split column > by delimiter > Custom – Enter: ° in the text box like this:

an excel split by delimiter dataclick on OK

Hide Location.2, Location.3, Location.4 & Location.5

Select Location.1 > Right Click > Split Column > by Number of characters > Number of characters: 2 > Once, as far right as possible > OK

cleaning data in excel shaping filtering

Hide Location.1.2 and rename Location.1.1 to Location

One last thing! making sure that the data type of height is numbers.

Select height > change type > number

Also,

Select floors > change type > number

click on DONE. Here’s our filtered and shaped data!

filter data excel shape clean

7) LET”S VISUALIZE IT!

For the purpose of visualization I copied first 20 rows to a separate excel sheet and created a chart:

z excel data visualization

That’s about it for this post. Here are some related Posts on Data Explorer:
Unpivoting data using the data explorer preview for Excel 2010/2013
Merging/Joining datasets in Excel using Data Explorer add-in
Remove Duplicates in Excel Tables using Data Explorer Add-in
Web Scraping Tables using Excel add-in Data Explorer preview:

Your comments are very welcome!

Data Reporting ≠ Data Analysis

One of the key thing I’ve learned is importance of differentiating the concepts of “Data Reporting” and “Data Analysis”. So, let’s first see them visually:

data analysis and data reporting

Here’s the logic for putting Data Reporting INSIDE Data Analysis: if you need to do “analysis” then you need reports. But you do not have to necessarily do data analysis if you want to do data reporting.

From a process standpoint, Here’s how you can visualize Data Reporting and Data Analysis:

data analysis and data reporting process

Let’s thing about this for a moment: Why do we need “analysis”?

We need it because TOOLS are really great at generating data reports. But it requires a HUMAN BRAIN to translate those “data points/reports” into “business insights”. This process of seeing the data points and translating them into business insights is core of what is Data Analysis. Here’s how it looks visually:

Data analysis Data Reporting

Note after performing data analysis, we have information like Trends and Insights, Action items or Recommendations, Estimated impact on business that creates business value.

Conclusion:

Data Reporting ≠ Data Analysis

Resource: 12 recorded sessions from the 24hop business analytics edition are online! #passbac #msbi

Recently, PASS hosted a 24hop business analytics event:

And now, the 12 one hour sessions ranging from data visualization, predictive analytics to Big Data are online for you to watch! They also serve as “Trailer” for what you can expect at the PASS Business Analytics conference!

Here’s the URL: http://passbaconference.com/Sessions/SneakPeeks.aspx

And I was following some of these sessions live on the event day – and I can tell you, these sessions are great resources!

Also, I participated in the twitter contest (by Microsoft BI) that was happening along w/ the event – and this is what I got for my win!

24 hop twitter contest prize

hoodie w/ embedded earphones!

That’s about it for this post. Enjoy the recordings!

Found something interesting by exploring a “List of companies by revenue” Data Set:

I like exploring data sets to find interesting patterns from them. To that end, I was exploring a data-set: List of companies by revenue and I added a column to calculate Revenue/Employee to explore the dataset:

And I found an outlier!

Here’s the outlier: Exor

Here’s what it’s interesting:

It’s revenue in 2012 is: 109.15 billion USD

And number of employees is just 40!

Just think of Revenue/Employee !

To put things in perspective, Lets Compare that with its neighbor in the data-set:

Rank | Company | Industry | Revenue in USD billion | Employees

48 Koch Industries Conglomerate 110.00 60000.00
49 EXOR Investment 109.15 40.00
50 Cardinal Health Pharmaceuticals 107.55 40000.00
51 CVS Caremark Retail 107.10 202000.00
52 IBM Computer services 106.92 433362.00

I got to know about this by quickly creating a data visualization to explore the data-set:

list of companies by revenue

And removing Trafigura, Vitol and Exor, this is what we have:

power view excel 2013 rank revenue employees

Observation: oil and gas industry have relatively higher revenue/employee ration.

That’s about it for this post. Thanks for reading about my data exploration!

Quick Note about SAS’s acroynm SEMMA:

SEMMA is an acronym introduced by SAS which stands for:

Sample, Explore, Modify, Model and Assess.

I had recently posted about the Data Mining & Knowledge Discovery Process which had following sequential steps:

Raw Data => cleaning => sampling => Modeling => Testing

SEMMA follows the similar sequential steps as we had seen in the data mining process. So while Data Mining process is applicable to any data mining tool out their, SEMMA helps when you use SAS enterprise miner. In fact, it has helped me quickly find the data mining functions available in SAS tool:

sas sample explore modify model assess

Back to basics: Data Mining and Knowledge Discovery Process

Once in a while I go back to basics to revisit some of the fundamental technology concepts that I’ve learned over past few years. Today, I want to revisit Data Mining and Knowledge Discovery Process:

Here are the steps:

1) Raw Data

2) Data Pre processing (cleaning, sampling, transformation, integration etc)

3) Modeling (Building a Data Mining Model)

4) Testing the Model a.k.a assessing the Model

5) Knowledge Discovery

Here is the visualization:

knowledge discovery process data miningAdditional Note:

In the world of Data Mining and Knowledge discovery, we’re looking for a specific type of intelligence from the data which is Patterns. This is important because patterns tend to repeat and so if we find patterns from our data, we can predict/forecast that such things can happen in future.

Conclusion:

In this blog post, we saw the Knowledge Discovery and Data Mining process.