PASS Business Analytics VC: Insider’s Introduction to Microsoft Azure Machine Learning (#AzureML). #sqlpass


Session Abstract:
Microsoft has introduced a new technology for developing analytics applications in the cloud. The presenter has an insider’s perspective, having actively provided feedback to the Microsoft team which has been developing this technology over the past 2 years. This session will 1) provide an introduction to the Azure technology including licensing, 2) provide demos of using R version 3 with AzureML, and 3) provide best practices for developing applications with Azure Machine Learning.
Speaker BIO:
Mark is a consultant who provides enterprise data science analytics advice and solutions. He uses Microsoft Azure Machine Learning, Microsoft SQL Server Data Mining, SAS, SPSS, R, and Hadoop (among other tools). He works with Microsoft Business Intelligence (SSAS, SSIS, SSRS, SharePoint, Power BI, .NET). He is a SQL Server MVP and has a research doctorate (PhD) from Georgia Tech.


Hope to see you there!

Paras Doshi
Business Analytics Virtual Chapter’s Co-Leader


Back to basics: Multi Class Classification vs Two class classification.

Classification algorithms are commonly used to build predictive models. Here’s what they do (simplified!):

Machine Learning Predictive Algorithms analytics Introduction

Now, here’s the difference between Multi Class and Two Class:

if your Test Data needs to be classified into two classes then you use a two-class classification model.


1. Is it going to Rain today? YES or NO

2. Will the buyer renew his soon-to-expire subscription? YES or NO

3. What is the sentiment of this text? Positive OR Negative

As you can see from above examples the test data needs to be classified in two classes.

Now, look at example #3 – What is the sentiment of the text? What if you also want an additional class called “neutral” – so now there are three classes and we’ll need to use a multi-class classification model. So, If your test data needs to be classified into more than two classes then you use a multi-class classification model.


1. Sentiment analysis of customer reviews? Positive, Negative, Neutral

2. What is the weather prediction for today? Sunny, Cloudy, Rainy, Snow

I hope the examples helped, so next time you have to choose between multi class and two class classification models, ask yourself – does the problem ask you to predict two classes or more? based on that, you’ll need to pick your model.

Example: Azure Machine Learning (AzureML) studio’s classifier list:

Azure Machine Learning classifiers list

I hope this helps!

Resource: Introduction to Data Science by Prof Bill Howe, UW

Introduction to Data Science course taught by Bill Howe just started on coursera platform. Having studied the Data Intensive Computing in Cloud course at UW taught by Prof Bill Howe, I can say that this course would be great resource too!

Check it out:

Introduction to Data Science

What’s “Naive” about Naive Bayes Machine Learning Algorithm?

In this post, I’ll post what why does the “Naive Bayes machine learning” algo have the word Naive in it?

So here is the short answer:

It “assumes” that the features are independent. (In other words: There’s no relation between the features that are used while building the model)

Let’s go a little deeper:

First up, few basic pointers.

> It’s a machine learning algorithm used for classification

> It’s based on Bayesian Statistics.

> you can read about it here:

Now, what do you mean when you mean that it is Naive because it assumes that features are independent?

Let’s take an example:

Suppose, you are building a “credit card approval” model based on Income and CreditScore

(SideNote: For those who do not know what is credit score, here you go:

And you have the following columns in the training data (Note: In machine learning, think of this columns as features)

Income CreditScore Approved
High High Yes
High Medium Yes
Low High Yes
Low Low NO

Here the features are Income & CreditScore and the target of the classification model is Approved.

In real world, there’s some relation between “income” and “creditscore”. Agree? Great! But Naive Bayes doesn’t think so. Let me reiterate the point of this blog post and see if it makes more sense now: it assumes that the features are “independent” and that’s why it is Naive!

I hope this helps. your comments are very welcome!

Steps to Install Weka on desktop running windows OS:

Weka is a popular free open source machine learning tool. In this post, I’ll note the steps that I took to install it on windows machine:

1. Search “Download Weka”. As of today, the URL is

2. Now, it’ll have options to download the Weka. Here, based on your

- Machine configuration (x86 vs x64)

- Java version and the corresponding Weka version

So let’s check that:

3. To check the Java version installed on your computer, open up command prompt and type Java -version

weka install machine learning windowsNote that I’ve java version 1.7

let’s see if it’s compatible w/ the weka version:

weka java version

As you can see, the version of weka that I’ll be installing requires Java 1.7 and I already have that – so for now my machine, I selected the option:

Click here to download a self-extracting executable without the Java VM

Also remember to check the operation system type (x86 vs x64) and download the corresponding version of weka.

4. After downloading, install it. I left all the options default.

5. After successful installation, I launched weka by going to:

start > all programs > weka 3.6.9 > weka 3.6.9

weka gui chooser machine learning

That’s about it for this post.

See what went into building WATSON, an advanced machine learning & natural language processing system powered by Big Data!

Do you know about Jeopardy! quiz show where a computer named Watson was able to beat world champions? No! Go watch it! Yes? Nice! Isn’t it a feat as grand as the one achieved by Deep blue (chess computer); if not less?

I am always interested in how such advanced computers was built. In case of Watson, It’s fascinating how technologies such as Natural language processing, machine learning & artificial intelligence backed by massive compute & storage power was able to beat two human world champions. And as a person interested in analytic’s and Big Data – I would classify this technology under Big Data and Advanced Data Analytics where computer analyzes lots of data to answer a question asked in a natural language. It also uses advanced machine learning algorithms. To that end, If you’re interested in getting an overview of what went into building WATSON, watch this:

If you’re as amazed as I am, considering sharing what amazed you about this technology via comment section:

Machine Learning VS. Data Mining

For the Past couple of months, One of the things that I have thought about is “What is the Difference Between Machine Learning & Data Mining”. I have Studied Data Mining and Advanced Data Mining concepts at both Undergraduate and Graduate level and recently I started learning about Machine Learning via  - I was curious to know the difference between the two similar/inter-related fields. After, spending time understanding what Machine Learning is – Here’s what I am thinking:

When I learned Data Mining – The focus was on Taking a Data-set and using (more than one) Algorithm(s) to detect Patterns in the data-set. I am studying machine learning – Here, we’re asked to write algorithms (and build models). So To me, Data Mining seems to be deal with practical aspects of putting Machine Learning algorithms to use.

When I took Data Mining courses – I didn’t write algorithms. But learned what different Data Mining Algorithms can do and what kind of patterns each algorithm helps us find. In machine learning class, my focus is to learn how to write the algorithms (build the model) and optimize it so that it can predict well.

Also, in machine learning the goal is clear – the questions are mostly like “Build a model from Past Data that predicts X “. whereas I remember, For our Graduate Level class, My professor gave our Team a data-set of “fatal accident data” and said “Go play with it!”

These were my experiences. What are your experiences with Data Mining, Machine Learning – and how do you differentiate between these two fields which are similar in more than one ways?

[video] Data Science is not NEW – it’s just that we live in a VERY special time!

  • Data Analysis is NOT new
  • Data Mining is NOT new
  • Predictive Analytic is NOT new
  • Machine Learning is NOT new
  • Statistics is NOT new
  • And Data Science is NOT new

So what’s new?

  • The rate at which data is produced.
  • The variety in Data that’s being produced.
  • The “amount” of data that’s being produced.

And we did not have Tools and Techniques before – But now we do! Indeed, We live in a VERY special time!

Here’s a nice 5 minute video titled “Data Science: Beyond Intuition”.

Link to video:  AND Thanks Ryan Swanstrom for sharing!

Back to basics: What is the difference between Data Analysis and Data Mining?

What is the difference between Data Analaysis and Data Mining:

1) One view is that: Data Mining is one particular form of Data Analysis.

difference between data mining and data analysis

One of the reason I researched about the difference between Data Analysis and Data Mining because I find that the terms are used Interchangeably and now I know why. It’s because Data Mining is considered as a particular form of Data Analysis.

2) I found other view that says:

Data Analysis is meant to support decision-making, support conclusions & Highlight note-worthy information. So when “Analyzing data” – we know what we want; we want answers to support our hypothesis; we want data in summarized form to highlight useful information.


Data Mining is meant for “Knowledge discovery” and “predictions”. So when “Mining data” – we look for undefined insights; We want the data to tell us something we didn’t knew before; We want to find patterns in the data that we had not anticipated.



Data Mining: Classification VS Clustering (cluster analysis)

For someone who is new to Data mining, classification and clustering can seem similar because both data mining algorithms essentially “divide” the datasets into sub-datasets; But there is difference between them and this blog-post, we’ll see exactly that:

  • We have a Training set containing data that have been previously categorized
  • Based on this training set, the algorithms finds the category that the new data points belong to
  • We do not know the characteristics of similarity of data in advance
  • Using statistical concepts, we split the datasets into sub-datasets such that the Sub-datasets have “Similar” data
Since a Training set exists, we describe this technique as Supervised learning Since Training set is not used, we describe this technique as Unsupervised learning
Example:We use training dataset which categorized customers that have churned. Now based on this training set, we can classify whether a customer will churn or not. Example:We use a dataset of customers and split them into sub-datasets of customers with “similar” characteristics. Now this information can be used to market a product to a specific segment of customers that has been identified by clustering algorithm

If you want to learn about Data Mining, check out the “free Book in PDF format: Mining the massive data-sets”.