[VIDEO] Microsoft’s vision for “Advanced analytics” (presented at #sqlpass summit 2015)


Presented at #sqlpass summit 2015.

#sqlpass webinar: “Data Analytics Explained for Business Leaders” on 1/15


A quick blog post to let you know about a #sqlpass webinar on 1/15.

Data Analytics Explained for Business Leaders

Thu, Jan 15 2015 12:00 (UTC-05:00) Eastern Time (US & Canada)

RSVP: http://bit.ly/PASSBAVC011515


Description: The world is becoming more efficient. Today, seventy percent of the companies that graced the Fortune 1000 list a mere decade ago have vanished. Agility and survival are function of innovation, culture, and the ability to predict the future. To that end, data analytics offers a lifeline, a means of survival that will drive productivity and continue to disrupt and redefine business. However, the resources available to today’s business leaders sit on two vastly different ends of the spectrum. On the one hand, highly technical academic resources and on the other largely fluffy overviews of value propositions and potentials. The state of the industry shouldn’t be surprising. The same dynamics played out in early years of the internet. Software providers, technical leaders, and consulting firms greatly benefit from mystifying the world of data analytics into something that is incomprehensible. That lack of conceptual understanding is incredibly risky and propels the cost of analytics initiatives upwards. This webcast aims to bridge that gap between the technical data scientists and business leaders. Ultimately, this understanding will help to: – Connect the strategic goals of business leaders with the capabilities of technical advisers – Focus investments and initiatives within analytics and technology – Distill immensely complex subject matter into comprehensible examples – Accelerate the path to value and increase the ROI of analytics initiatives

Speaker Bio

Alex is a Predictive Analytics Architect in the Oil and Gas industry with a passion for distilling complexity into insights and evangelizing data science. His work has been featured on KDNuggets and he was recognized by DataScienceCentral as a top 180 blogger in 2014.

RSVP: http://bit.ly/PASSBAVC011515

I hope to see you there!

Examples to help you differentiate between Business Intelligence and Data Science problems:


In this post, I’ll list few examples from various industries to help you differentiate between business intelligence and data science problems.

Sometime back, I blogged about “Business Analytics Continuum” and in the post we saw that Every Organization has DATA but they use their business data at different levels because of their maturity level. Excel (or other transactional reporting tools) is usually the starting point for any organization – it helps them see WHAT happened. They advance to the next stage, where they get capabilities to slice and dice their data – To find out WHY – and usually this capability is delivered using Business Intelligence tools & techniques. Once the data culture spreads – Thanks to a successful Business Intelligence project – then they soon start to outgrow their business intelligence capabilities by asking problems that need predictive capabilities. This is advanced analytics and Data Science stage. To that end, here are 5 examples to help you differentiate between business intelligence and data science problems:

Business Intelligence.(WHAT & WHY) Data Science & advanced analytics.
Bike Rentals
  1. How many bikes did we rent in Q3 2014? How does that compare to Q3 2013?
  2. What is the trend of total bike rentals at week level? Can you break it down by geography?
Can you predict bike rentals on an hourly basis?
Credit Risk
  1. How many customers have a credit risk of ‘C’?
  2. Can you rank customers by their payments due amount that have a credit risk ‘C’?
Can you predict the credit risk of the customer during contract negotiations stage?
Customer relationship management
  1. How many account cancellations occurred this year (broken down by month and customer segmentation)?
  2. How does percentage of account cancellations this year compare to that previous year?
 Can you predict customer churn?
Flight Delays
  1. What is the trend of % of flight delayed this year?
  2. Can you break down flight delays this year by their reasons?
Can you predict whether a scheduled flight will be delayed by more than 15 minutes?
Customer feedback
  1. What is the customer satisfaction % trend this year?
  2. What is the customer satisfaction % broken down by customer segments and product segments?
Can you classify a customer feedback comment into “positive”, “negative” or “neutral”?

I hope this helps!

PASS Business Analytics VC: Insider’s Introduction to Microsoft Azure Machine Learning (#AzureML). #sqlpass


RSVP: http://bit.ly/PASSBAVC091814

Session Abstract:
Microsoft has introduced a new technology for developing analytics applications in the cloud. The presenter has an insider’s perspective, having actively provided feedback to the Microsoft team which has been developing this technology over the past 2 years. This session will 1) provide an introduction to the Azure technology including licensing, 2) provide demos of using R version 3 with AzureML, and 3) provide best practices for developing applications with Azure Machine Learning.
Speaker BIO:
Mark is a consultant who provides enterprise data science analytics advice and solutions. He uses Microsoft Azure Machine Learning, Microsoft SQL Server Data Mining, SAS, SPSS, R, and Hadoop (among other tools). He works with Microsoft Business Intelligence (SSAS, SSIS, SSRS, SharePoint, Power BI, .NET). He is a SQL Server MVP and has a research doctorate (PhD) from Georgia Tech.

RSVP: http://bit.ly/PASSBAVC091814

Hope to see you there!

Paras Doshi
Business Analytics Virtual Chapter’s Co-Leader


Back to basics: Multi Class Classification vs Two class classification.


Classification algorithms are commonly used to build predictive models. Here’s what they do (simplified!):

Machine Learning Predictive Algorithms analytics Introduction

Now, here’s the difference between Multi Class and Two Class:

if your Test Data needs to be classified into two classes then you use a two-class classification model.


1. Is it going to Rain today? YES or NO

2. Will the buyer renew his soon-to-expire subscription? YES or NO

3. What is the sentiment of this text? Positive OR Negative

As you can see from above examples the test data needs to be classified in two classes.

Now, look at example #3 – What is the sentiment of the text? What if you also want an additional class called “neutral” – so now there are three classes and we’ll need to use a multi-class classification model. So, If your test data needs to be classified into more than two classes then you use a multi-class classification model.


1. Sentiment analysis of customer reviews? Positive, Negative, Neutral

2. What is the weather prediction for today? Sunny, Cloudy, Rainy, Snow

I hope the examples helped, so next time you have to choose between multi class and two class classification models, ask yourself – does the problem ask you to predict two classes or more? based on that, you’ll need to pick your model.

Example: Azure Machine Learning (AzureML) studio’s classifier list:

Azure Machine Learning classifiers list

I hope this helps!

Resource: Introduction to Data Science by Prof Bill Howe, UW


Introduction to Data Science course taught by Bill Howe just started on coursera platform. Having studied the Data Intensive Computing in Cloud course at UW taught by Prof Bill Howe, I can say that this course would be great resource too!

Check it out: https://www.coursera.org/course/datasci

Introduction to Data Science

What’s “Naive” about Naive Bayes Machine Learning Algorithm?


In this post, I’ll post what why does the “Naive Bayes machine learning” algo have the word Naive in it?

So here is the short answer:

It “assumes” that the features are independent. (In other words: There’s no relation between the features that are used while building the model)

Let’s go a little deeper:

First up, few basic pointers.

> It’s a machine learning algorithm used for classification

> It’s based on Bayesian Statistics.

> you can read about it here: http://en.wikipedia.org/wiki/Naive_Bayes_classifier

Now, what do you mean when you mean that it is Naive because it assumes that features are independent?

Let’s take an example:

Suppose, you are building a “credit card approval” model based on Income and CreditScore

(SideNote: For those who do not know what is credit score, here you go: http://en.wikipedia.org/wiki/Credit_score_in_the_United_States)

And you have the following columns in the training data (Note: In machine learning, think of this columns as features)

Income CreditScore Approved
High High Yes
High Medium Yes
Low High Yes
Low Low NO

Here the features are Income & CreditScore and the target of the classification model is Approved.

In real world, there’s some relation between “income” and “creditscore”. Agree? Great! But Naive Bayes doesn’t think so. Let me reiterate the point of this blog post and see if it makes more sense now: it assumes that the features are “independent” and that’s why it is Naive!

I hope this helps. your comments are very welcome!

Steps to Install Weka on desktop running windows OS:


Weka is a popular free open source machine learning tool. In this post, I’ll note the steps that I took to install it on windows machine:

1. Search “Download Weka”. As of today, the URL is http://www.cs.waikato.ac.nz/ml/weka/downloading.html

2. Now, it’ll have options to download the Weka. Here, based on your

– Machine configuration (x86 vs x64)

– Java version and the corresponding Weka version

So let’s check that:

3. To check the Java version installed on your computer, open up command prompt and type Java -version

weka install machine learning windowsNote that I’ve java version 1.7

let’s see if it’s compatible w/ the weka version:

weka java version

As you can see, the version of weka that I’ll be installing requires Java 1.7 and I already have that – so for now my machine, I selected the option:

Click here to download a self-extracting executable without the Java VM

Also remember to check the operation system type (x86 vs x64) and download the corresponding version of weka.

4. After downloading, install it. I left all the options default.

5. After successful installation, I launched weka by going to:

start > all programs > weka 3.6.9 > weka 3.6.9

weka gui chooser machine learning

That’s about it for this post.

See what went into building WATSON, an advanced machine learning & natural language processing system powered by Big Data!


Do you know about Jeopardy! quiz show where a computer named Watson was able to beat world champions? No! Go watch it! Yes? Nice! Isn’t it a feat as grand as the one achieved by Deep blue (chess computer); if not less?

I am always interested in how such advanced computers was built. In case of Watson, It’s fascinating how technologies such as Natural language processing, machine learning & artificial intelligence backed by massive compute & storage power was able to beat two human world champions. And as a person interested in analytic’s and Big Data – I would classify this technology under Big Data and Advanced Data Analytics where computer analyzes lots of data to answer a question asked in a natural language. It also uses advanced machine learning algorithms. To that end, If you’re interested in getting an overview of what went into building WATSON, watch this:

If you’re as amazed as I am, considering sharing what amazed you about this technology via comment section:

Machine Learning VS. Data Mining


For the Past couple of months, One of the things that I have thought about is “What is the Difference Between Machine Learning & Data Mining”. I have Studied Data Mining and Advanced Data Mining concepts at both Undergraduate and Graduate level and recently I started learning about Machine Learning via Coursera.org  – I was curious to know the difference between the two similar/inter-related fields. After, spending time understanding what Machine Learning is – Here’s what I am thinking:

When I learned Data Mining – The focus was on Taking a Data-set and using (more than one) Algorithm(s) to detect Patterns in the data-set. I am studying machine learning – Here, we’re asked to write algorithms (and build models). So To me, Data Mining seems to be deal with practical aspects of putting Machine Learning algorithms to use.

When I took Data Mining courses – I didn’t write algorithms. But learned what different Data Mining Algorithms can do and what kind of patterns each algorithm helps us find. In machine learning class, my focus is to learn how to write the algorithms (build the model) and optimize it so that it can predict well.

Also, in machine learning the goal is clear – the questions are mostly like “Build a model from Past Data that predicts X “. whereas I remember, For our Graduate Level class, My professor gave our Team a data-set of “fatal accident data” and said “Go play with it!”

These were my experiences. What are your experiences with Data Mining, Machine Learning – and how do you differentiate between these two fields which are similar in more than one ways?